Dynamic Factor Models

(1)

HAL Id: halshs-02262202

https://halshs.archives-ouvertes.fr/halshs-02262202

Preprint submitted on 2 Aug 2019

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Dynamic Factor Models

Catherine Doz, Peter Fuleky

To cite this version:

(2)

WORKING PAPER N° 2019 – 45

Dynamic Factor Models

Catherine Doz

Peter Fuleky

JEL Codes: C32, C38, C53, C55

Keywords : dynamic factor models; big data; two-step estimation; time

domain; frequency domain; structural breaks

(3)

DYNAMIC FACTOR MODELS

Catherine Doz

1,2

and Peter Fuleky

3

1

_{Paris School of Economics}

2

_{Université Paris 1 Panthéon-Sorbonne}

3

_{University of Hawaii at Manoa}

July 2019

∗_{Prepared for Macroeconomic Forecasting in the Era of Big Data Theory and Practice (Peter Fuleky ed),} Springer.

(4)

Chapter 2 Dynamic Factor Models

Catherine Doz and Peter Fuleky

Abstract Dynamic factor models are parsimonious representations of relationships

among time series variables. With the surge in data availability, they have proven to be indispensable in macroeconomic forecasting. This chapter surveys the evolution of these models from their pre-big-data origins to the large-scale models of recent years. We review the associated estimation theory, forecasting approaches, and several extensions of the basic framework.

Keywords: dynamic factor models; big data; two-step estimation; time domain;

frequency domain; structural breaks;

JEL Codes: C32, C38, C53, C55

2.1 Introduction

Factor analysis is a dimension reduction technique summarizing the sources of vari-ation among variables. The method was introduced in the psychology literature by Spearman (1904), who used an unobserved variable, or factor, to describe the cogni-tive abilities of an individual. Although originally developed for independently dis-tributed random vectors, the method was extended by Geweke (1977) to capture the co-movements in economic time series. The idea that the co-movement of macroe-conomic series can be linked to the business cycle has been put forward by Burns and Mitchell (1946): “a cycle consists of expansions occurring at about the same time in many economic activities, followed by similarly general recessions, contractions, and revivals which merge into the expansion phase of the next cycle; this sequence of

Catherine Doz

Paris School of Economics and University Paris 1 Panthéon-Sorbonne, 48 Boulevard Jourdan, 75014 Paris, France. e-mail: [email protected]

Peter Fuleky

University of Hawaii, 2424 Maile Way, Honolulu, HI 96822, USA. e-mail: [email protected] 1

(5)

changes is recurrent but not periodic.” Early applications of dynamic factor models (DFMs) to macroeconomic data, by Sargent and Sims (1977) and Stock and Wat-son (1989, 1991, 1993), suggested that a few latent factors can account for much of the dynamic behavior of major economic aggregates. In particular, a dynamic single-factor model can be used to summarize a vector of macroeconomic indica-tors, and the factor can be seen as an index of economic conditions describing the business cycle. In these studies, the number of time periods in the data set exceeded the number of variables, and identification of the factors required relatively strict assumptions. While increments in the time dimension were limited by the passage of time, the availability of a large number of macroeconomic and financial indicators provided an opportunity to expand the dataset in the cross-sectional dimension and to work with somewhat relaxed assumptions. Chamberlain (1983) and Chamberlain and Rothschild (1983) applied factor models to wide panels of financial data and paved the way for further development of dynamic factor models in macroecono-metrics. Indeed, since the 2000’s dynamic factor models have been used extensively to analyze large macroeconomic data sets, sometimes containing hundreds of series with hundreds of observations on each. They have proved useful for synthesizing information from variables observed at different frequencies, estimation of the latent business cycle, nowcasting and forecasting, and estimation of recession probabilities and turning points. As Diebold (2003) pointed out, although DFMs “don’t analyze

really Big Data, they certainly represent a movement of macroeconometrics in that

direction”, and this movement has proved to be very fruitful.

Several very good surveys have been written on dynamic factor models, including Bai and Ng (2008b); Bai and Wang (2016); Barigozzi (2018); Breitung and Eick-meier (2006); Darné et al. (2014); Lütkepohl (2014); Stock and Watson (2006, 2011, 2016). Yet we felt that the readers of this volume would appreciate a chapter cover-ing the evolution of dynamic factor models, their estimation strategies, forecastcover-ing approaches, and several extensions to the basic framework. Since both small- and large-dimensional models have advanced over time, we review the progress achieved under both frameworks. In Section 2.2 we describe the distinguishing characteristics of exact and approximate factor models. In Sections 2.3 and 2.4, we review estimat-ing procedures proposed in the time domain and the frequency domain, respectively. Section 2.5 presents approaches for determining the number of factors. Section 2.6 surveys issues associated with forecasting, and Section 2.8 reviews the handling of structural breaks in dynamic factor models.

2.2 From Exact to Approximate Factor Models

In a factor model, the correlations among n variables, x1. . . xn, for which T

observa-tions are available, are assumed to be entirely due to a few, r < n, latent unobservable variables, called factors. The link between the observable variables and the factors is assumed to be linear. Thus, each observation xit can be decomposed as

(6)

xit = µi+ i0ft+eit,

where µiis the mean of xi, i is an r ⇥ 1 vector, and eit and ftare two uncorrelated

processes. Thus, for i = 1 . . . n and t = 1 . . . T, xitis decomposed into the sum of two

mutually orthogonal unobserved components: the common component, it = i0ft,

and the idiosyncratic component, ⇠it = µi+eit. While the factors drive the correlation

between xi and xj,j , i, the idiosyncratic component arises from features that

are specific to an individual xi variable. Further assumptions placed on the two

unobserved components result in factor models of different types.

2.2.1 Exact factor models

The exact factor model was introduced by Spearman (1904). The model assumes that the idiosyncratic components are not correlated at any leads and lags so that ⇠it

and ⇠ksare mutually orthogonal for all k , i and any s and t, and consequently all

correlation among the observable variables is driven by the factors. The model can be written as xit = µi+ r ’ j=1 i jfjt+eit, i = 1 . . . n, t = 1 . . . T, (2.1)

where fjt and eit are orthogonal white noises for any i and j; eit and ekt are

orthogonal for k , i; and i j is the loading of the jth factor on the ith variable.

Equation (2.1) can also be written as:

xit = µi+ i0ft+eit, i = 1 . . . n, t = 1 . . . T,

with 0

i = ( i1. . . ir) and ft = ( f1t. . . frt)0. The common component is it =

Õr

j=1 i jfjt, and the idiosyncratic component is ⇠it = µi+eit.

In matrix notation, with xt = (x1t. . .xnt)0, ft = ( f1t. . . frt)0 and et =

(e1t. . .ent)0, the model can be written as

xt =µ + ⇤ ft+ et, (2.2)

where ⇤ = ( 1. . . n)0is a n ⇥r matrix of full column rank (otherwise fewer factors

would suffice), and the covariance matrix of et is diagonal since the idiosyncratic

terms are assumed to be uncorrelated. Observations available for t = 1, . . . T can be stacked, and Equation (2.2) can be rewritten as

X = ◆T⌦ µ0+ F⇤0+ E,

where X = (x1. . . xT)0 is a T ⇥ n matrix, F = ( f1. . . fT)0a T ⇥ r matrix, E =

(7)

While the core assumption behind factor models is that the two processes ft

and et are orthogonal to each other, the exact static factor model further assumes

that the idiosyncratic components are also orthogonal to each other, so that any correlation between the observable variables is solely due to the common factors. Both orthogonality assumptions are necessary to ensure the identifiability of the model (see Anderson, 1984; Bartholomew, 1987). Note that ft is defined only up

to a premultiplication by an invertible matrix Q since ft can be replaced by Q ft

whenever ⇤ is replaced by ⇤Q 1_{. This means that only the respective spaces spanned}

by ft and by ⇤ are uniquely defined. This so-called indeterminacy problem of the

factors must be taken into account at the estimation stage.

Factor models have been introduced in economics by Engle and Watson (1981); Geweke (1977); Sargent and Sims (1977). These authors generalized the model above to capture dynamics in the data. Using the same notation as in Equation (2.2), the dynamic exact factor model can be written as

xt=µ + ⇤0ft+⇤1ft 1+· · · + ⇤sft s+ et,

or more compactly

xt=µ + ⇤(L) ft+ et, (2.3)

where L is the lag operator.

In this model, ftand etare no longer assumed to be white noises, but are instead

allowed to be autocorrelated dynamic processes evolving according to ft=⇥(L)ut

and et = ⇢(L)"t, where the q and n dimensional vectors ut and "t, respectively,

contain iid errors. The dimension of ft is also q, which is therefore referred to as

the number of dynamic factors. In most of the dynamic factor models literature, ft and et are generally assumed to be stationary processes, so if necessary, the

observable variables are pre-processed to be stationary. (In this chapter, we only consider stationary variables; non-stationary models will be discussed in Chapter 17.)

The model admits a static representation

xt =µ + ⇤Ft+ et, (2.4)

with Ft=( ft0, ft 10 , . . . , ft s0 )0, an r = q(1+ s) dimensional vector of static factors,

and ⇤ = (⇤0, ⇤1, . . . , ⇤s), a n ⇥ r matrix of loading coefficients. The dynamic

representation in (2.2.1) and (2.3) captures the dependence of the observed variables on the lags of the factors explicitly, while the static representation in (2.4) embeds those dynamics implicitly. The two forms lead to different estimation methods to be discussed below.

Exact dynamic factor models assume that the cross-sectional dimension of the data set, n < T, is finite, and they are usually used with small, n ⌧ T, samples. There are two reasons for these models to be passed over in the n ! 1 case. First, maximum likelihood estimation, the typical method of choice, requires specifying a full parametric model and imposes a practical limitation on the number of parameters that can be estimated as n ! 1. Second, as n ! 1, some of the unrealistic

(8)

assumptions imposed on exact factor models can be relaxed, and the approximate factor model framework, discussed in the next section, can be used instead.

2.2.2 Approximate factor models

As noted above, exact factor models rely on a very strict assumption of no cross-correlation between the idiosyncratic components. In two seminal papers Chamber-lain (1983) and ChamberChamber-lain and Rothschild (1983) introduced approximate factor models by relaxing this assumption. They allowed the idiosyncratic components to be mildly cross-correlated and provided a set of conditions ensuring that approximate factor models were asymptotically identified as n ! 1.

Let xn

t =(x1t, . . . ,xnt)0denote the vector containing the tthobservation of the

first n variables as n ! 1, and let ⌃n = cov(xnt) be the covariance matrix of

xt. Denoting by 1(A) 2(A) · · · n(A) the ordered eigenvalues of any

symmetric matrix A with size (n ⇥ n), the assumptions underlying approximate factor models are the following:

(CR1) Supn r(⌃n) = 1 (the r largest eigenvalues of ⌃nare diverging)

(CR2) Supn r+1(⌃n) < 1 (the remaining eigenvalues of ⌃nare bounded)

(CR3) Infn n(⌃n) > 0 (⌃ndoes not approach singularity)

The authors show that under assumptions (CR1)-(CR3), there exists a unique de-composition ⌃n=⇤n⇤n0+ n, where ⇤nis a sequence of nested n ⇥r matrices with

rank r and i(⇤n⇤0n) ! 1, 8i = 1 . . . r, and 1( n) < 1. Alternatively, xtcan be

decomposed using a pair of mutually orthogonal random vector processes ftand etn

xn_t =µn+⇤nft+ ent,

with cov( ft) = Ir and cov(ent) = n, where the r common factors are pervasive, in

the sense that the number of variables affected by each factor grows with n, and the idiosyncratic terms may be mildly correlated with bounded covariances.

Although originally developed in the finance literature (see also Connor and Korajczyk, 1986, 1988, 1993), the approximate static factor model made its way into macroeconometrics in the early 2000’s (see for example Bai, 2003; Bai and Ng, 2002; Stock and Watson, 2002a,b). These papers use assumptions that are analogous to (CR1)-(CR3) for the covariance matrices of the factor loadings and the idiosyncratic terms, but they add complementary assumptions (which vary across authors for technical reasons) to accommodate the fact that the data under study are autocorrelated time series. The models are mainly used with data that is stationary or preprocessed to be stationary, but they have also been used in a non-stationary framework (see Bai, 2004; Bai and Ng, 2004). The analysis of non-stationary data will be addressed in detail in Chapter 17.

Similarly to exact dynamic factor models, approximate dynamic factor models also rely on an equation linking the observable series to the factors and their lags,

(9)

but here the idiosyncratic terms can be mildly cross-correlated, and the number of series is assumed to tend to infinity. The model has a dynamic

x_tn=µn+⇤n(L) ft+ ent

and a static representation

x_tn=µn+⇤nFt+ ent

equivalent to (2.3) and (2.4), respecively.

By centering the observed series, µn can be set equal to zero. For the sake of

simplicity, in the rest of the chapter we will assume that the variables are centered, and we also drop the sub- and superscript n in ⇤n, xnt and etn. We will always

assume that in exact factor models the number of series under study is finite, while in approximate factor models n ! 1, and a set of assumptions that is analogous to (CR1)-(CR3)—but suited to the general framework of stationary autocorrelated processes—is satisfied.

2.3 Estimation in the Time Domain

2.3.1 Maximum likelihood estimation of small factor models

The static exact factor model has generally been estimated by maximum likelihood under the assumption that ( ft)t2Zand (et)t2Zare two orthogonal iid gaussian

pro-cesses. Unique identification of the model requires that we impose some restrictions on the model. One of these originates from the definition of the exact factor model: the idiosyncratic components are set to be mutually orthogonal processes with a diagonal variance matrix. A second restriction sets the variance of the factors to be the identity matrix, Var( ft) = Ir. While the estimator does not have a closed form

analytical solution, for small n the number of parameters is small, and estimates can be obtained through any numerical optimization procedure. Two specific methods have been proposed for this problem: the so-called zig-zag routine, an algorithm which solves the first order conditions (see for instance Anderson, 1984; Lawley and Maxwell, 1971; Magnus and Neudecker, 2019) and the Jöreskog (1967) approach, which relies on the maximization of the concentrated likelihood using a Fletcher-Powell algorithm. Both approaches impose an additional identifying restriction on ⇤. The maximum likelihood estimators ˆ⇤, ˆ of the model parameters ⇤, are p

T consistent and asymptotically gaussian (see Anderson and Rubin, 1956). Under standard stationarity assumptions, these asymptotic results are still valid even if the true distribution of ftor etis not gaussian: in this case ˆ⇤ and ˆ are QML estimators

of ⇤ and .

Various formulas have been proposed to estimate the factors, given the parameter estimates ˆ⇤, ˆ . Commonly used ones include

(10)

• bft= b⇤0(b⇤b⇤0+b) 1xt, the conditional expectation of ftgiven xtand the estimated

values of the parameters, which, after elementary calculations, can also be written as bft=(Ir+ b⇤0b 1b⇤) 1b⇤0b 1xt, and

• bft=(b⇤0b 1b⇤) 1b⇤0b 1xtwhich is the FGLS estimator of ft, given the estimated

loadings.

Additional details about these two estimators are provided by Anderson (1984). Since the formulas are equivalent up to an invertible matrix, the spaces spanned by the estimated factors are identical across these methods.

A dynamic exact factor model can also be estimated by maximum likelihood under the assumption of gaussian ( f0

t, e0t)t_2Z1. In this case, the factors are assumed

to follow vector autoregressive processes, and the model can be cast in state-space form. To make things more precise, let us consider a factor model where the vector of factors follows a VAR(p) process and enters the equation for xtwith s lags

xt =⇤0ft+⇤1ft 1+· · · + ⇤sft s+ et, (2.5)

ft = 1ft+· · · + pft p+ ut, (2.6)

where the VAR(p) coefficient matrices, , capture the dynamics of the factors. A commonly used identification restriction sets the variance of the innovations to the identity matrix, cov(ut) = Ir, and additional identifying restrictions are

imposed on the factor loadings. The state-space representation is very flexible and can accommodate different cases as shown below:

• If s p 1 and if etis a white noise, the measurement equation is xt =⇤Ft+ et

with ⇤ = (⇤0⇤1. . . ⇤s) and Ft=( ft0ft 10 . . . ft s0 )0(static representation of the

dynamic model), and the state equation is 26 66 66 66 66 66 64 ft ft 1 .. . ft p+1 .. . ft s 37 77 77 77 77 77 75 = 26 66 66 66 66 66 64 1. . . p O . . . O Iq O . . . O O Iq O . . . O .. . . . . ... ... ... .. . . . . ... ... ... O . . . O Iq O 37 77 77 77 77 77 75 26 66 66 66 66 66 64 ft 1 ft 2 .. . ft p .. . ft s 1 37 77 77 77 77 77 75 + 26 66 66 66 4 Iq O .. . O 37 77 77 77 5 ut

• If s < p 1 and if etis a white noise, the measurement equation is xt =⇤Ft+ et

with ⇤ = (⇤0⇤1. . . ⇤sO . . . O) and Ft = ( ft0ft 10 . . . ft p+10 )0, and the state

equation is 26 66 66 66 4 ft ft 1 .. . ft p+1 37 77 77 77 5 = 26 66 66 66 4 1 2 . . . p Iq O . . . O .. . . . . ... ... O . . . Iq O 37 77 77 77 5 26 66 66 66 4 ft 1 ft 2 .. . ft p 37 77 77 77 5 + 26 66 66 66 4 Iq O .. . O 37 77 77 77 5 ut (2.7)

(11)

• The state-space representation can also accommodate the case where each of the idiosyncratic components is itself an autoregressive process. For instance, if s = p = 2 and eit follows a second order autoregressive process with eit =

di1eit 1+di2eit 2+ "it for i = 1 . . . n, then the measurement equation can be

written as xt=⇤↵twith ⇤ = (⇤0⇤1⇤2InO) and ↵t=( ft0ft 10 ft 20 et0et 10 )0,

and the state equation is 26 66 66 66 64 ft ft 1 ft 2 et et 1 37 77 77 77 75 = 26 66 66 66 64 1 2O O O Iq O O O O O Iq O O O O O O D1 D2 O O O In O 37 77 77 77 75 26 66 66 66 64 ft 1 ft 2 ft 3 et 1 et 2 37 77 77 77 75 + 26 66 66 66 64 Iq O O O O O O In O O 37 77 77 77 75  ut "t , (2.8)

with Dj = diag d1j. . .dn j for j = 1, 2 and "t = ("1t. . . "nt)0. This model,

with n = 4 and r = 1, was used by Stock and Watson (1991) to build a coincident index for the US economy.

One computational challenge is keeping the dimension of the state vector small when n is large. The inclusion of the idiosyncratic component in ↵timplies that the

dimension of system matrices in the state equation (2.8) grows with n. Indeed, this formulation requires an element for each lag of each idiosyncratic component. To limit the computational cost, an alternative approach applies the filter In D1L

D2L2 to the measurement equation. This pre-whitening is intended to control for

serial correlation in the idiosyncratic terms, so that they don’t need to be included in the state equation. The transformed measurement equation takes the form xt =

ct + e⇤↵t +"t, for t 2, with ct = D1xt 1 + D2xt 2, e⇤ = ⇣ e⇤0 . . . e⇤4 ⌘ and ↵t =( ft ft 1 . . . ft 4)0, since (In D1L D2L2)xt=⇤0ft+(⇤1 D1⇤0) ft 1+ (⇤2 D1⇤1 D2⇤0) ft 2 (D1⇤2+ D2⇤1) ft 3 (D2⇤2) ft 4+"t (2.9) The introduction of lags of xtin the measurement equation does not cause further

complications; they can be incorporated in the Kalman filter since they are known at time t. The associated state equation is straightforward and the dimension of ↵tis

smaller than in (2.8).

Once the model is written in state-space form, the gaussian likelihood can be computed using the Kalman filter for any value of the parameters (see for instance Harvey, 1989), and the likelihood can be maximized by any numerical optimization procedure over the parameter space. Watson and Engle (1983) proposed to use a score algorithm, or the EM algorithm, or a combination of both, but any other numerical procedure can be used when n is small. With the parameter estimates, b✓, in hand, the Kalman smoother provides an approximation of ft using information from all

(12)

the parameter estimators and the factors follow from general results concerning maximum likelihood estimation with the Kalman filter.

To further improve the computational efficiency of estimation, Jungbacker and Koopman (2014) reduced a high-dimensional dynamic factor model to a low-dimensional state space model. Using a suitable transformation of the measurement equation, they partitioned the observation vector into two mutually uncorrelated subvectors, with only one of them depending on the unobserved state. The transfor-mation can be summarized by

x⇤_t = Axt with A =  AL AH and x⇤t =  xL t xH t ,

where the model for x⇤

t can be written as

x_tL= AL⇤Ft+ etL and xtH = etH,

with eL

t = ALetand etH = AHet. Consequently, the xtH subvector is not required

for signal extraction and the Kalman filter can be applied to a lower dimensional collapsed model, leading to substantial computational savings. This approach can be combined with controlling for idiosyncratic dynamics in the measurement equation, as described above in (2.9).

2.3.2 Principal component analysis of large approximate factor models

Chamberlain and Rothschild (1983) suggested to use principal component analy-sis (PCA) to estimate the approximate static factor model, and Stock and Watson (2002a,b) and Bai and Ng (2002) popularized this approach in macro-econometrics. PCA will be explored in greater detail in Chapter 8, but we state the main results here.

Considering centered data and assuming that the number of factors, r, is known, PCA allows to simultaneously estimate the factors and their loadings by solving the least squares problem

min ⇤,F 1 nT n ’ i=1 T ’ t=1 xit 0ift 2=min ⇤,F 1 nT T ’ t=1 (xt ⇤ ft)0(xt ⇤ ft). (2.10)

Due to the aforementioned rotational indeterminacy of the factors and their loadings, the parameter estimates have to be constrained to get a unique solution. Generally, one of the following two normalization conditions is imposed

1 T T ’ t=1 bftbft0= Ir or b⇤ 0b⇤ n = Ir.

(13)

Using the first normalization and concentrating out ⇤ gives an estimated factor ma-trix, bF, which is T times the eigenvectors corresponding to the r largest eigenvalues of the T ⇥T matrix XX0_{. Given b}_F_{, b}_{⇤ =}_{( b}_F0_Fb₎ 1_Fb0_{X = b}_F0_X_{/T is the corresponding}

matrix of factor loadings. The solution to the minimization problem above is not unique, even though the sum of squared residuals is unique. Another solution is given by e⇤ constructed as n times the eigenvectors corresponding to the r largest eigenvalues of the n ⇥ n matrix X0_X_{. Using the second normalization here implies}

e

F = X e⇤/n. Bai and Ng (2002) indicated that the latter approach is computation-ally less costly when T > n, while the former is less demanding when T < n. In both cases, the idiosyncratic components are estimated bybet= xt b⇤bft, and their

covariance is estimated by the empirical covariance matrix ofbet. Since PCA is not

scale invariant, many authors (for example Stock and Watson, 2002a,b) center and standardize the series, generally measured in different units, and as a result PCA is applied to the sample correlation matrix in this case.

Stock and Watson (2002a,b) and Bai and Ng (2002) proved the consistency of these estimators, and Bai (2003) obtained their asymptotic distribution under a stronger set of assumptions. We refer the reader to those papers for the details, but let us note that these authors replace assumption (CR1) by the following stronger assumption2

lim

n!1

⇤0⇤ n =⌃⇤

and replace assumptions (CR2) and (CR3), which were designed for white noise data, by analogous assumptions taking autocorrelation in the factors and idiosyn-cratic terms into account. Under these assumptions, the factors and the loadings are proved to be consistent, up to an invertible matrix H, converging at rate

nT =1/min(pn,pT)ır, so that

bft H ft=OP( nT) and bi H 1 i =OP( nT), 8 i = 1 . . . n.

Under a more stringent set of assumptions, Bai (2003) also obtains the following asymptotic distribution results:

• If pn/T ! 0 then, for each t: pn(bft H0ft) !

d N(0, ⌦t) where ⌦tis known.

• IfpT/n ! 0 then, for each i:pT(bi H 1 i) d

! N(0, Wi) where Wiis known.

2.3.3 Generalized principal component analysis of large approximate

factor models

Generalized principal components estimation mimics generalized least squares to deal with a nonspherical variance matrix of the idiosyncratic components. Although

2A weaker assumption can also be used: 0 < c  lim inf

n!1 r( ⇤0⇤ n ) < lim sup n!1 1( ⇤0⇤ n )  c < 1

(14)

the method had been used earlier, Choi (2012) was the one who proved that efficiency gains can be achieved by including a weighting matrix in the minimization

min ⇤,F 1 nT T ’ t=1 (xt ⇤ ft)0 1(xt ⇤ ft).

The feasible generalized principal component estimator replaces the unknown by an estimator b . The diagonal elements of can be estimated by the variances of the individual idiosyncratic terms (see Boivin and Ng, 2006; Jones, 2001). Bai and Liao (2013) derive the conditions under which the estimated bcan be treated as known. The weighted minimization problem above relies on the assumption of independent idiosyncratic shocks, which may be too restrictive in practice. Stock and Watson (2005) applied a diagonal filter D(L) to the idiosyncratic terms to deal with serial correlation, so the problem becomes

min D_{(L),⇤, e}F 1 T T ’ t=1 (D(L)xt ⇤ eft)0e 1(D(L)xt ⇤ eft),

where eF =_{( e}f1. . . efT)0with eft= D(L) ftand e = E[eetee0t] with eet= D(L)et.

Esti-mation of D(L) and eFcan be done sequentially, iterating to convergence. Breitung and Tenhofen (2011) propose a similar two-step estimation procedure that allows for heteroskedastic and serially correlated idiosyncratic terms.

Even though PCA was first used to estimate approximate factor models where the factors and the idiosyncratic terms were iid white noise, most asymptotic results carried through to the case when the factors and the idiosyncratic terms were sta-tionary autocorrelated processes. Consequently, the method has been widely used as a building block in approximate dynamic factor model estimation, but it required extensions since PCA on its own does not capture dynamics. Next we review several approaches that attempt to incorporate dynamic behavior in large scale approximate factor models.

2.3.4 Two-step and quasi-maximum likelihood estimation of large

approximate factor models

Doz et al. (2011) proposed a two-step estimator that takes into account the dynamics of the factors. They assume that the factors in the approximate dynamic factor model xt=⇤ ft+et, follow a vector autoregressive process, ft= 1ft 1+· · ·+ pft p+ut,

and they allow the idiosyncratic terms to be autocorrelated but do not specify their dynamics. As illustrated in Section 2.3.1, this model can easily be cast in state-space form. The estimation procedure is the following:

Step 1 Preliminary estimators of the loadings, b⇤, and factors, bft, are obtained

(15)

xit b_i0bft, and their variance is estimated by the associated empirical variance

bii. The estimated factors, bft, are used in a vector-autoregressive model to obtain

the estimates bj,j = 1 . . . p.

Step 2 The model is cast in state-space form as in (2.7), with the variance of the

common shocks set to the identity matrix, cov(ut) = Ir, and cov(et) defined

as = diag( 11. . . nn). Using the parameter estimates b⇤, b , bj,j = 1 . . . p

obtained in the first step, one run of the Kalman smoother is then applied to the data. It produces a new estimate of the factor, bft|T =E( ft|x1, . . . , xT, b✓), where

b✓ is a vector containing the first step estimates of all parameters. It is important to notice that, in this second step, the idiosyncratic terms are misspecified, since they are taken as mutually orthogonal white noises.

Doz et al. (2011) prove that, under their assumptions, the parameter estimates, b✓, are consistent, converging at rate ⇣min₍pn,pT)⌘ 1. They also prove that, when the Kalman smoother is run with those parameter estimates instead of the true parameters, the resulting two step-estimator of ftis also min(pn,pT) consistent.

Doz et al. (2012) proposed to estimate a large scale approximate dynamic factor model by quasi-maximum likelihood (QML). In line with Doz et al. (2011), the quasi-likelihood is based on the assumption of mutually orthogonal iid gaussian idiosyncratic terms (so that the model is treated as if it were an exact factor model, even though it is not), and a gaussian VAR model for the factors. The corresponding log-likelihood can be obtained from the Kalman filter for given values of the param-eters, and they use an EM algorithm to compute the maximum likelihood estimator. The EM algorithm, proposed by Dempster et al. (1977) and first implemented for dynamic factor models by Watson and Engle (1983), alternates an expectation step relying on a pass of the Kalman smoother for the current parameter values and a max-imization step relying on multivariate regressions. The application of the algorithm is tantamount to successive applications of the two-step approach. The calculations are feasible even when n is large, since in each iteration of the algorithm, the current estimate of the ith _loading,

i, is obtained by ordinary least squares regression of

xit on the current estimate of the factors. The authors prove that, under a standard

set of assumptions, the estimated factors are mean square consistent. Their results remain valid even if the processes are not gaussian, or if the idiosyncratic terms are not iid, or not mutually orthogonal, as long as they are only weakly cross-correlated. Reis and Watson (2010) apply this approach to a model with serially correlated idiosyncratic terms. Jungbacker and Koopman (2014) also use the EM algorithm to estimate a dynamic factor model with autocorrelated idiosyncratic terms, but instead of extending the state vector as in Equation (2.8), they transform the measurement equation as described in Section 2.3.5.

Bai and Li (2012) study QML estimation in the more restricted case where the quasi-likelihood is associated with the static exact factor model. They obtain consistency and asymptotic normality of the estimated loadings and factors under a set of appropriate assumptions. Bai and Li (2016) incorporate these estimators into the two-step approach put forward by Doz et al. (2011) and obtain similar asymptotic

(16)

results. They also follow Jungbacker and Koopman (2014) to handle the case where the idiosyncratic terms are autoregressive processes. Bai and Liao (2016) extend the approach of Bai and Li (2012) to the case where the idiosyncratic covariance matrix is sparse, instead of being diagonal, and propose a penalized maximum likelihood estimator.

2.3.5 Estimation of large approximate factor models with missing data

Observations may be missing from the analyzed data set for several reasons. At the beginning of the sample, certain time series might be available from an earlier start date than others. At the end of the sample, the dates of final observations may differ depending on the release lag of each data series. Finally, observations may be missing within the sample since different series in the data set may be sampled at different frequencies, for example monthly and quarterly. DFM estimation techniques assume that the observations are missing at random, so there is no endogenous sample selection. Missing data are handled differently in principal components and state space applications.

The least squares estimator of principal components in a balanced panel given in Equation (2.10) needs to be modified when some of the nT observations are missing. Stock and Watson (2002b) showed that estimates of F and ⇤ can be obtained numerically by min ⇤,F 1 nT n ’ i=1 T ’ t=1 Sit xit i0ft 2

where Sit = 1 if an observation on xit is available and Sit = 0 otherwise. The

objective function can be minimized by iterations alternating the optimization with respect to ⇤ given F and then F given ⇤; each step in the minimization has a closed form expression. Starting values can be obtained, for example, by principal component estimation using a subset of the series for which there are no missing observations. Alternatively, Stock and Watson (2002b) provide an EM algorithm for handling missing observations.

Step 1 Fill in initial guesses for missing values to obtain a balanced dataset.

Estimate the factors and loadings in this balanced dataset by principal components analysis.

Step 2 The values in the place of a missing observations for each variable are

updated by the expectation of xit conditional on the observations, and the factors

and loadings from the previous iteration.

Step 3 With the updated balanced dataset in hand, reestimate the factors and

loadings by principal component analysis. Iterate step 2 and 3 until convergence. The algorithm provides both, estimates of the factors and estimates of the missing values in the time series.

(17)

The state space framework has been adapted to missing data by either allowing the measurement equation to vary depending on what data are available at a given time (see Harvey, 1989, section 6.3.7) or by including a proxy value for the missing observation while adjusting the model so that the Kalman filter places no weight on the missing observation (see Giannone et al., 2008).

When the dataset contains missing values, the formulation in Equation (2.9) is not feasible since the lagged values on the right hand side of the measurement equation are not available in some periods. Jungbacker et al. (2011) addressed the problem by keeping track of periods with missing observations and augmenting the state vector with the idiosyncratic shocks in those periods. This implies that the system matrices and the dimension of the state vector are time-varying. Yet, the model can still be collapsed by transforming the measurement equation and partitioning the observation vector as described above and removing from the model the subvector that does not depend on the state. Under several simplifying assumptions, Pinheiro et al. (2013) developed an analytically and computationally less demanding algorithm for the special case of jagged edges, or observations missing at the end of the sample due to varying publication lags across series.

Since the Kalman filter and smoother can easily accommodate missing data, the two step method of Doz et al. (2011) is also well-suited to handle unbalanced panel datasets. In particular, it also allows to overcome the jagged edge data problem. This feature of the two-step method has been exploited in predicting low frequency macroeconomic releases for the current period, also known as nowcasting (see Sec-tion 2.6). For instance, Giannone et al. (2008) used this method to incorporate the real-time informational content of monthly macroeconomic data releases into current-quarter GDP forecasts. Similarly, BaÒbura and Rünstler (2011) used the two-step framework to compute the impact of monthly predictors on quarterly GDP forecasts. They extended the method by first computing the weights associated with individual monthly observations in the estimates of the state vector using an algo-rithm by Koopman and Harvey (2003), which then allowed them to compute the contribution of each variable to the GDP forecast.

In the maximization step of the EM algorithm, the calculation of moments in-volving data was not feasible when some observations were missing, and therefore the original algorithm required a modification to handle an incomplete data set. Shumway and Stoffer (1982) allowed for missing data but assumed that the fac-tor loading coefficients were known. More recently, Banbura and Modugno (2014) adapted the EM algorithm to a general pattern of missing data by using a selection matrix to carry out the maximization step with available data points. The basic idea behind their approach is to write the likelihood as if the data were complete and to adapt the Kalman filter and smoother to the pattern of missing data in the E-step of the EM algorithm, where the missing data are replaced by their best linear predic-tions given the information set. They also extend their approach to the case where the idiosyncratic terms are univariate AR processes. Finally they provide a statistical decomposition, which allows one to inspect how the arrival of new data affects the forecast of the variable of interest.

(18)

Modeling mixed frequencies via the state space approach makes it possible to associate the missing observations with particular dates and to differentiate between stock variables and flow variables. The state space model typically contains an aggre-gator that averages the high-frequency observations over one low-frequency period for stocks, and sums them for flows (see Aruoba et al., 2009). However, summing flows is only appropriate when the variables are in levels; for growth rates it is a mere approximation, weakening the forecasting performance of the model (see Fuleky and Bonham, 2015). As pointed out by Mariano and Murasawa (2003) and Proietti and Moauro (2006), appropriate aggregation of flow variables that enter the model in log-differences requires a non-linear state space model.

2.4 Estimation in the Frequency Domain

Classical principal component analysis, described in Section 2.3.2, estimates the space spanned by the factors non-parametrically only from the cross-sectional vari-ation in the data. The two-step approach, discussed in Section 2.3.4, augments principal components estimation with a parametric state space model to capture the dynamics of the factors. Frequency-domain estimation combines some features of the previous two approaches: it relies on non-parametric methods that exploit variation both over time and over the cross-section of variables.

Traditional static principal component analysis focuses on contemporaneous cross-sectional correlation and overlooks serial dependence. It approximates the (contemporaneous) covariance matrix of xt by a reduced rank covariance matrix.

While the correlation of two processes may be negligible contemporaneously, it could be high at leads or lags. Discarding this information could result in loss of pre-dictive capacity. Dynamic principal component analysis overcomes this shortcoming by relying on spectral densities. The n ⇥ n spectral density matrix of a second order stationary process, xt, for frequency ! 2 [ ⇡, ⇡] is defined as

⌃x(!) = 1 2⇡ 1 ’ k= 1 e ik! x(k),

where x(k) = E[xtx_{t k}0 ]. In analogy to their static counterpart, dynamic principal

components approximate the spectrum of xt by a reduced rank spectral density

matrix. Static principal component analysis was generalized to the frequency domain by Brillinger (1981). His algorithm relied on a consistent estimate of xt’s spectral

density, b⌃x(!), at frequency !. The eigenvectors corresponding to the r largest

eigenvalues of b⌃x(!), a Hermitian matrix, are then transformed by the inverse

Fourier transform to obtain the dynamic principal components.

The method was popularized in econometrics by Forni et al. (2000). Their gener-alized dynamic factor model encompasses as a special case the approximate factor model of Chamberlain (1983) and Chamberlain and Rothschild (1983), which allows

(19)

for correlated idiosyncratic components but is static. And it generalizes the factor model of Sargent and Sims (1977) and Geweke (1977), which is dynamic but as-sumes orthogonal idiosyncratic components. The method relies on the assumption of an infinite cross-section to identify and consistently estimate the common and idiosyncratic components. The common component is a projection of the data on the space spanned by all leads and lags of the first r dynamic principal components, and the orthogonal residuals from this projection are the idiosyncratic components. Forni et al. (2000) and Favero et al. (2005) propose the following procedure for estimating the dynamic principal components and common components.

Step 1 For a sample x1. . . xTof size T, estimate the spectral density matrix of xt

by b⌃x(!h) = 1 2⇡ M ’ k= M wke ik!hbx(k), !h=2⇡h/(2M + 1), h = M . . . M,

where wk =1 |k|/(M + 1) are the weights of the Bartlett window of width M

and bx(k) = (T k) 1ÕT_t=k+1(xt x)(xt k x)0is the sample covariance matrix

of xtfor lag k. For consistent estimation of ⌃x(!) the window width has to be

chosen such that M ! 1 and M/T ! 0 as T ! 1. Forni et al. (2000) note that a choice of M = 2

3T1/3worked well in simulations.

Step 2 For h = M . . . M, compute the eigenvectors 1(!h) . . . r(!h)

corre-sponding to the r largest eigenvalues of b⌃x(!h). The choice of r is guided by a

heuristic inspection of eigenvalues. Note that, for M = 0, j(!h) is simply the

jth_{eigenvector of the (estimated) variance-covariance matrix of x}

t: the dynamic

principal components then reduce to the static principal components.

Step 3 Define j(L) as the two-sided filter j(L) = M ’ k= M jkLk, k = M . . . M, where jk= 1 2M + 1 M ’ h= M j(!h)e ik!h,

The first r dynamic principal components of xt are bfjt = j(L)0xt,j = 1 . . . r,

which can be collected in the vector bft=( bf1t. . . bfrt)0.

Step 4 Run an OLS regression of xton present, past and future dynamic principal

components

xt=⇤ qbft+q+ . . . +⇤pbft p,

and estimate the common component as the fitted value bt = b⇤ qbft+q+ . . . + b⇤pbft p,

(20)

where b⇤l,l = q . . . pare the OLS estimators, and the leads q and lags p used

in the regression can be chosen by model selection criteria. The idiosyncratic component is the residual, b⇠it =xit bit.

Although this method efficiently estimates the common component, its reliance on two-sided filtering rendered it unsuitable for forecasting. Forni et al. (2005) extended the frequency domain approach developed in their earlier paper to forecasting in two steps.

Step 1 In the first step, they estimated the covariances of the common and

idiosyn-cratic components using the inverse Fourier transforms b (k) = 1_{2M + 1} ’M k= M eik!hb⌃ (! h) and b⇠(k) = 1 2M + 1 M ’ k= M eik!hb⌃ ⇠(!h)

where b⌃ _(!h) and b⌃⇠(!h) = b⌃x(!h) b⌃ (!h) are the spectral density matrices

corresponding to the common and idiosyncratic components, respectively.

Step 2 In the second step, they estimated the factor space using a linear

combi-nation of the x’s, 0_x_t ₌ 0 _t₊ 0_⇠_t_{. Specifically, they compute r independent}

linear combinations bfjt = b0_jxt, where the weights bj maximize the variance of 0 _t_{and are defined recursively}

bj =arg max

2Rn

0b (0) s.t. 0b_⇠_{(0) = 1 and} 0b_⇠_(0)b

m=0,

for j = 1 . . . r and 1  m  j 1 (only the first constraint applies for j = 1). The solutions of this problem, bj, are the generalized eigenvectors associated with the

generalized eigenvalues b⌫j of the matrices, b (0) and b⇠(0),

b0

jb (0) = b⌫jb0jb⇠(0), j = 1 . . . n,

with the normalization constraint b0

jb⇠(0)bj =1and bi0b⇠(0)bj =0 for i , j.

The linear combinations bfjt = b0jxt,j = 1 . . . n are the generalized principal

components of xtrelative to the couple (b (0),b⇠(0)). Defining b⇤ =(b1. . . br),

the space spanned by the common factors is estimated by the first r generalized principal components of the x’s: b⇤0xt= b10xt. . . br0xt. The forecast of the

com-mon component depends on the covariance between iT +hand b⇤0xT. Observing

that this covariance equals the covariance between T +h and b⇤0 T, the forecast

of the common component can be obtained from the projection bT +h|T = b (h)b⇤(b⇤0b (0)b⇤) 1b⇤0xT.

(21)

Since this two-step forecasting method relied on lags but not leads, it avoided the end-of-sample problems caused by two-sided filtering in the authors’ earlier study. A simulation study by the authors suggested that this procedure improves upon the forecasting performance of Stock and Watson’s (2002) static principal components method for data generating processes with heterogeneous dynamics and heterogeneous variance ratio of the common and idiosyncratic components. In line with most of the earlier literature, Forni et al. (2009) continued to assume that the space spanned by the common components at any time t has a finite-dimension r as n tends to infinity, allowing a static representation of the model. By identifying and estimating cross-sectionally pervasive shocks and their dynamic effect on macroeconomic variables, they showed that dynamic factor models are suitable for structural modeling.

The finite-dimension assumption for the common component rules out certain factor-loading patterns. In the model xit =ai(1 biL) 1ut+ ⇠it, where utis a scalar

white noise and the coefficients biare drawn from a uniform distribution (-0.9, 0.9),

the space spanned by the common components it, i 2 N is infinite dimensional.

Forni and Lippi (2011) relaxed the finite-dimensional assumption and proposed a one-sided estimator for the general dynamic factor model of Forni et al. (2000). Forni et al. (2015, 2017) continued to allow the common components—driven by a finite number of common shocks—to span an infinite-dimensional space and investigated the model’s one-sided representations and asymptotic properties. Forni et al. (2018) evaluated the model’s pseudo real-time forecasting performance for US macroeco-nomic variables and found that it compares favorably to finite dimensional methods during the Great Moderation. The dynamic relationship between the variables and the factors in this model is more general than in models assuming a finite common component space, but, as pointed out by the authors, its estimation is rather complex. Hallin and Lippi (2013) give a general presentation of the methodological foun-dations of dynamic factor models. Fiorentini et al. (2018) introduced a frequency domain version of the EM algorithm for dynamic factor models with latent autore-gressive and moving average processes. In this paper the authors focused on an exact factor model with a single common factor, and left approximate factor models with multiple factors for future research. But they extended the basic EM algorithm with an iterated indirect inference procedure based on a sequence of simple auxiliary OLS regressions to speed up computation for models with moving average components.

Although carried out in the time domain, Peña and Yohai’s (2016) general-ized dynamic principal components model mimics two important features of Forni et al.’s (2000) generalized dynamic factor model: it allows for both a dynamic repre-sentation of the common component and nonorthogonal idiosyncratic components. Their procedure chooses the number of common components to achieve a desired degree of accuracy in a mean squared error sense in the reconstruction of the original series. The estimation iterates two steps: the first is a least squares estimator of the loading coefficients assuming the factors are known, and the second is updating the factor estimate based on the estimated coefficients. Since the authors do not place restrictions on the principal components, their method can be applied to potentially nonstationary time series data. Although the proposed method does well for data

(22)

reconstruction, it is not suited for forecasting, because—as in Forni et al. (2000)—it uses both leads and lags to reconstruct the series.

2.5 Estimating the Number of Factors

Full specification of the DFM requires selecting the number of common factors. The number of static factors can be determined by information criteria that use penalized objective functions or by analyzing the distribution of eigenvalues. Bai and Ng (2002) used information criteria of the form IC(r) = lnVr(b⇤, bF)+rg(n,T), where Vr(b⇤, bF) is

the least squares objective function (2.10) evaluated with the principal components estimators when r factors are considered, and g(n,T) is a penalty function that satisfies two conditions: g(n,T) ! 0 and min[n1/2_,T1/2_{] · g(n,T) ! 1, as n,T !}

1. The estimator for the number of factors is brIC = min0_{r rmax}IC(r), where

rmax is the upper bound of the true number of factors. The authors show that the estimator is consistent without restrictions between n and T, and the results hold under heteroskedasticity in both the time and cross-sectional dimension, as well as under weak serial and cross-sectional correlation. Li et al. (2017) develop a method to estimate the number of factors when the number of factors is allowed to increase as the cross-section and time dimensions increase. This is useful since new factors may emerge as changes in the economic environment trigger structural breaks.

Ahn and Horenstein (2013) and Onatski (2009, 2010) take a different approach by comparing adjacent eigenvalues of the spectral density matrix at a given frequency or of the covariance matrix of the data. The basic idea behind this approach is that the first r eigenvalues will be unbounded, while the remaining values will be bounded. Therefore the ratio of subsequent eigenvalues is maximized at the location of the largest relative cliff in a scree plot (a plot of the ordered eigenvalues against the rank of those eigenvalues). These authors also present alternative statistics using the difference, the ratio of changes, and the growth rates of subsequent eigenvalues.

Estimation of the number of dynamic factors usually requires several steps. As illustrated in Section 2.2.1, the number of dynamic factors, q, will in general be lower than the number of static factors r = q(1 + s), and therefore the spectrum of the common component will have a reduced rank with only q nonzero eigenvalues. Based on this result, Hallin and Liska (2007) propose a frequency-domain procedure which uses an information criterion to estimate the rank of the spectral density matrix of the data. Bai and Ng (2007) take a different approach by first estimating the number of static factors and then applying a VAR(p) model to the estimated factors to obtain the residuals. They use the eigenvalues of the residual covariance matrix to estimate the rank q of the covariance of the dynamic (or primitive) shocks. In a similar spirit, Amengual and Watson (2007) first project the observed variables xt onto p lags

of consistently estimated r static principal components ft. . . ft pto obtain serially

uncorrelated residuals ubt = xt Õpi=1b⇧ibft i. These residuals then have a static

(23)

criterion to the sample variance matrix of these residuals yields a consistent estimate of the number of dynamic factors, q.

2.6 Forecasting with Large Dynamic Factor Models

One of the most important uses of dynamic factor models is forecasting. Both small scale (i.e. n small) and large scale dynamic factor models have been used to this end. Very often, the forecasted variable is published with a delay: for instance, euro-area GDP is published six weeks after the end of the corresponding quarter. The long delay in data release implies that different types of forecasts can be considered. GDP predictions for quarter Q, made prior to that quarter, in Q 1,Q 2, . . . for instance, are considered to be “true” out of sample forecasts. But estimates for quarter Q can also be made during that same quarter using high frequency data released within quarter Q. These are called “nowcasts”. Finally, since GDP for quarter Q is not known until a few weeks into quarter Q +1, forecasters keep estimating it during this time interval. These estimates are considered to be “backcasts”. As we have seen, dynamic factor models are very flexible and can easily handle all these predictions. Forecasts using large dimensional factor models were first introduced in the literature by Stock and Watson (2002a). The method, also denoted diffusion index forecasts, consists of estimating the factors ftby principal component analysis, and

then using those estimates in a regression estimated by ordinary least squares yt+h= 0fbft+ w0wt+ "t+h,

where ytis the variable of interest, bftis the vector of the estimated factors, and wtis

a vector of observable predictors (typically lags of yt). The direct forecast for time

T + his then computed as

yT +h|T = b0fbfT+ b0wwT.

The authors prove that, under the assumptions they used to ensure the consistency of the principal component estimates,

plim_n_!1h_(b0_fbfT+ b0wwT) ( 0_ffT+ w0wT)

i =0,

so that the forecast is asymptotically equivalent to what it would have been if the factors had been observed. Furthermore, under a stronger set of assumptions Bai and Ng (2006) show that the forecast error is asymptotically gaussian, with known variance, so that forecast intervals can be computed. Stock and Watson (2002b) considered a more general model to capture the dynamic relationship between the variables and the factors

(24)

with forecast equation

byT +h|T =b↵h+ bh(L) bfT+bh(L)yT. (2.11)

They find that this model produces better forecasts for some variables and forecast horizons, but the improvements are not systematic.

While the diffusion index methodology relied on a distributed lag model, the approach taken by Doz et al. (2011) and Doz et al. (2012) captured factor dynamics explicitly. The estimates of a vector-autoregressive model for bft can be used to

recursively forecast bfT +h|T at time T. A mean square consistent forecast for period

T + hcan then be obtained by yT +h|T = b⇤ bfT +h|T. This approach has been used

by Giannone et al. (2008) and by many others. However Stock and Watson (2002b) point out that the diffusion index and two step approaches are equivalent since the recursive forecast of the factor in the latter implies that bfT +h|T is a function of bfT

and its lags, as in (2.11).

2.6.1 Targeting predictors and other forecasting refinements

In the diffusion index and two step forecasting methodology, the factors are first estimated from a large number of predictors, (x1t. . .xnt), by the method of principal

components, and then used in a linear forecasting equation for yt+h. Although the

method can parsimoniously summarize information from a large number of pre-dictors and incorporate it into the forecast, the estimated factors do not take into account the predictive power of xit for yt+h. Boivin and Ng (2006) pointed out that

expanding the sample size simply by adding data without regard to its quality or usefulness does not necessarily improve forecasts. Bai and Ng (2008a) suggested to target predictors based on their information content about y. They used hard and soft thresholding to determine which variables the factors are to be extracted from and thereby reduce the influence of uninformative predictors. Under hard thresholding, a pretest procedure is used to decide whether a predictor should be kept or not. Under soft thresholding, the predictor ordering and selection is carried out using the least angle regression (LARS) algorithm developed by Efron et al. (2004).

Kelly and Pruitt (2015) proposed a three-pass regression filter with the ability to identify the subset of factors useful for forecasting a given target variable while discarding those that are target irrelevant but may be pervasive among predictors. The proposed procedure uses the covariances between the variables in the dataset and the proxies for the relevant latent factors, the proxies being observable variables either theoretically motivated or automatically generated.

Pass 1 The first pass captures the relationship between the predictors, X, and

m ⌧ min(n,T) factor proxies, Z, by running a separate time series regression of each predictor, xi, on the proxies, xit = ↵i+ zt0 i + "it, for i = 1 . . . n, and

(25)

Pass 2 The second pass consists of T separate cross section regressions of the

predictors, xt, on the coefficients estimated in the first pass, xit = ↵t+ b_i0ft+ "it,

for t = 1 . . . T, and retaining bft. First-stage coefficient estimates map the

cross-sectional distribution of predictors to the latent factors. Second-stage cross section regressions use this map to back out estimates of the factors, bft, at each point in

time.

Pass 3 The third pass is a single time series forecasting regression of the target

variable yt+1 on the predictive factors estimated in the second pass, yt+1 =

↵ + bf_t0 + "t+1. The third-pass fitted value, byt+1, is the forecast for period t + 1.

The automatic proxy selection algorithm is initialized with the target variable itself z1 = y, and additional proxies based on the prediction error are added iteratively,

zk+1= y byk for k = 1 . . . m 1, where k is the number of proxies in the model at

the given iteration. The authors point out that partial least squares, futher analyzed by Groen and Kapetanios (2016), is a special case of the three-pass regression filter. Bräuning and Koopman (2014) proposed a collapsed dynamic factor model where the factor estimates are established jointly by the predictors xtand the target variable

yt. The procedure is a two-step process.

Step 1 The first step uses principal component analysis to reduce the dimension

of a large panel of macroeconomic predictors as in Stock and Watson (2002a,b).

Step 2 In the second step, the authors use a state space model with a small number

of parameters to model the principal components jointly with the target variable yt. The principal components, fPC,t, are treated as dependent variables that are

associated exclusively with the factors ft, but the factors ft enter the equation

for the target variable yt. The unknown parameters are estimated by maximum

likelihood, and the Kalman filter is used for signal extraction.

In contrast to the two-step method of Doz et al. (2011), this approach allows for a specific dynamic model for the target variable that may already produce good forecasts for yt.

Several additional methods originating in the machine learning literature have been used to improve forecasting performance of dynamic factor models, including bagging (Inoue and Kilian, 2008) and boosting (Bai and Ng, 2009), which will be discussed in Chapters 14 and 16, respectively.

2.7 Hierarchical Dynamic Factor Models

If a panel of data can be organized into blocks using a priori information, then between- and within-block variation in the data can be captured by the hierarchical dynamic factor model framework formalized by Moench et al. (2013). The block structure helps to model covariations that are not sufficiently pervasive to be treated as common factors. For example, in a three-level model, the series i, in a given block b, at each time t can exhibit idiosyncratic, block specific, and common variation,

(26)

xibt = gib(L)gbt+exibt

gbt =⇤f b(L) ft+ egbt f(L) ft = ut,

where variables xibt and xjbtwithin a block b are correlated because of the

com-mon factors ft or the block-specific shocks egbt, but correlations between blocks

are possible only through ft. Some of the xit may not belong to a block and could

be affected by the common factors directly, as in the two-level model (2.5). The id-iosyncratic components can be allowed to follow stationary autoregressive processes,

xib(L)exibt ="xibt and gb(L)egbt ="gbt.

If we were given data for production, employment, and consumption, then xi1t

could be one of the n1 production series, xi2t could be one of the n2 employment

series, and xi3tcould be one of the n3consumption series. The correlation between

the production, employment, and consumption factors, g1t, g2t, g3t, due to

economy-wide fluctuations, would be captured by ft. In a multicountry setting, a four-level

hierarchical model could account for series-specific, country (subblock), region (block), and global (common) fluctuations, as in Kose et al. (2003). If the country and regional variations were not properly modeled, they would appear as either weak common factors or idiosyncratic errors that would be cross-correlated among series in the same region. Instead of assuming weak cross-sectional correlation as in approximate factor models, the hierarchical model explicitly specifies the block structure, which helps with the interpretation of the factors.

To estimate the model, Moench et al. (2013) extend the Markov Chain Monte Carlo method that Otrok and Whiteman (1998) originally applied to a single factor model. Let ⇤ = (⇤g, ⇤f), = ( f, g, x), and = ( f, g, x) denote the

matrices containing the loadings, coefficients of the lag polynomials, and variances of the innovations, respectively. Organize the data into blocks, xbt, and get initial

values for gtand ftusing principal components; use these to produce initial values

for ⇤, , .

Step 1 Conditional on ⇤, , , ft, and the data xbt, draw gbt for all b.

Step 2 Conditional on ⇤, , , and gbt, draw ft.

Step 3 Conditional on ftand gbt, draw ⇤, , , and return to Step 1.

The sampling of gbtneeds to take into account the correlation across blocks due to

ft. As in previously discussed dynamic factor models, the factors and the loadings

are not separately identified. To achieve identification, the authors suggest using lower triangular loading matrices, fixed variances of the innovations to the factors,

f, g, and imposing additional restrictions on the structure of the lag polynomials.

Jackson et al. (2016) survey additional Bayesian estimation methods. Under the assumption that common factors have a direct impact on x, but are uncorrelated with block-specific ones, so that xibt = f ib(L) ft + gib(L)gbt +exibt, Breitung

and Eickmeier (2016) propose a sequential least squares estimation approach and compare it to other frequentist estimation methods.

Kose et al. (2008) use a hierarchical model to study international business cycle comovements by decomposing fluctuations in macroeconomic aggregates of G-7

(27)

countries into a common factor across all countries, country factors that are common across variables within a country, and idiosyncratic fluctuations. Moench and Ng (2011) apply this model to national and regional housing data to estimate the effects of housing shocks on consumption, while Fu (2007) decomposes house prices in 62 U.S. metropolitan areas into national, regional and metro-specific idiosyncratic factors. Del Negro and Otrok (2008) and Stock and Watson (2008) use this approach to add stochastic volatility to their models.

2.8 Structural Breaks in Dynamic Factor Models

As evident from the discussion so far, the literature on dynamic factor models has grown tremendously, evolving in many directions. In the remainder of this chapter we will concentrate on two strands of research: dynamic factor models 1) with Markov-switching behavior and 2) with time-varying loadings. In both cases, the aim is to take into account the evolution of macroeconomic conditions over time, either through 1) non-linearities in the dynamics of the factors or 2) the variation of loadings, which measure the intensity of the links between each observable variable and the underlying common factors. This instability seemed indeed particularly important to address after the 2008 global financial crisis and the subsequent slow recovery. These two strands of literature have presented a number of interesting papers in recent years. In what follows, we briefly describe some of them, but we do not provide an exhaustive description of the corresponding literature.

2.8.1 Markov-switching dynamic factor models

One of the first uses of dynamic factor models was the construction of coincident indexes. The literature soon sought to allow the dynamics of the index to vary according to the phases of the business cycle. Incorporating Markov switching into dynamic factor models (MS-DFM) was first suggested by Kim (1994) and Diebold and Rudebusch (1996)3. Diebold and Rudebusch (1996) considered a single factor, that played the role of a coincident composite index capturing the latent state of the economy. They suggested that the parameters describing the dynamics of the factor may themselves depend on an unobservable two-state Markov-switching latent variable. More precisely, they modeled the factor the same way as Hamilton (1989) modeled US real GNP to obtain a statistical characterization of business cycle phases. In practice, Diebold and Rudebusch (1996) used a two-step estimation method since they applied Hamilton’s model to a previously estimated composite index.

Kim (1994) introduced a very general model in which the dynamics of the factors and the loadings may depend on a state-variable. Using the notation used so far in

(28)

the current chapter, his model was

yt=⇤Stft+ BStxt+ et ft= Stft 1+ Stxt+ HSt"t

where x✓ t is a vector of exogenous or lagged dependent variables, and where the

et

"t

◆

’s are i.i.d. gaussian with covariance matrix ✓_{O Q}R O◆. The underlying state variable St can take M possible values, and is Markovian of order one, so that

P(St = j|St 1 = i, St 2 = k, . . .) = P(St = j|St 1 = i) = pi j. He proposed

a very powerful approximation in the computation of the Kalman filter and the likelihood, and estimated the model in one step by maximum likelihood4. Using this approximation, the likelihood can be computed at any point of the parameter space, and maximized using a numerical procedure. It must be however emphasized that such a procedure is applicable in practice only when the number of parameters is small, which means that the dimension of xt must be small. Once the parameters

have been estimated, it is possible to obtain the best approximations of ft and St

for any t using the Kalman filter and smoother, given the observations x1. . . xtand

x1. . . xT, respectively.

Kim (1994) and Chauvet (1998) estimated a one factor Markov switching model using this methodology. In both papers, the model is formulated like a classical dynamic factor model, with the factor following an autoregressive process whose constant term depends on a two-state Markov variable St:

xt= ft+ etand (L) ft= St + ⌘t

where ⌘t ⇠ i.i.d. N(0, 1) and St can take two values denoted as 0 and 1, which

basically correspond to expansions and recessions: the conditional expectation of the factor is higher during expansions than during recessions. Kim and Yoo (1995) used the same four observable variables as the US Department of Commerce and Stock and Watson (1989, 1991, 1993), whereas Chauvet (1998) considered several sets of observable series over various time periods, taking into account different specifications for the dynamics of the common factor. Both papers obtained posterior recession probabilities and turning points that were very close to official NBER dates. Kim and Nelson (1998) proposed a Gibbs sampling methodology that helped to avoid Kim’s (1994) approximation of the likelihood. This Bayesian approach also provided results that were very close to the NBER dates, and allowed tests of business cycle duration dependence. Chauvet and Piger (2008) compared the nonparametric dating algorithm given in Harding and Pagan (2003) with the dating obtained using Markov-switching models similar to those of Chauvet (1998). They showed that both approaches identify the NBER turning point dates in real time with reasonable accuracy, and identify the troughs with more timeliness than the NBER. But they

4Without this approximation, the Kalman filter would be untractable, since it would be necessary to take the MT _{possible trajectories of S}

1, . . . , ST. For further details, see Kim (1994) and the