Realized Factor Models for Vast Dimensional Covariance Estimation

(1)

Realized Factor Models

for Vast Dimensional Covariance Estimation

CMAP, ´Ecole Polytechnique Paris, 23 March 2009

Roel Oomen

(joint with Karim Bannouh, Martin Martens, and Dick van Dijk)

(2)

Motivation

I

High frequency data has spurred development of “realized”

measures

I realized variancedefined as sum of squared intra-period returns

(Andersen and Bollerslev, 1998; Barndorff-Nielsen and Shephard, 2002)

I consistent and efficient measure of quadratic variation

I

In univariate setting, there are some complications

1. micro-structure noise: sub-sampling, kernels, two/multi-scale

(Zhang, Mykland, and A¨ıt-Sahalia, 2005, Barndorff-Nielsen, Hansen, Lunde, and Shephard, 2008; Zhou, 1996; Jacod, Li, Mykland, Podolskij, and Vetter, 2007; Podolskij and Vetter, 2008)

2. jumps: bi-power variation, Med RV, threshold RV, quantile RV

(Barndorff-Nielsen and Shephard, 2004; Jacod, 2008; Mancini, 2004, 2006, Andersen, Dobrev, and Schaumburg, 2008, Christensen, Oomen, Podolskij, 2008)

3. outliers: quantile RV, threshold RV

(Christensen et al, Mancini, Jacod)

(3)

Motivation

I

In a multivariate setting there are additional considerations

I non-synchronous tradingand its resulting bias, i.e. the Epps effect

(Epps, 1979; Lo and MacKinlay, 1990)

I dimensionalityandstabilityof the covariance matrix

(Chan, Karceski, and Lakonishok, 1999; Fan, Fan, and Lv, 2008, Jagannathan and Ma, 2003)

I

The contribution this paper makes is as follows:

1. based on a linear factor model, we propose a simple way to estimate vast dimensional covariance matrices using high- and low-frequency datasimultaneously

2. we illustrate its use on a very large asset universe, rarely studied in the academic literature

3. theoretical and empirical results highlight the favorable properties of our approach

(4)

Covariance estimation

I

Let

r_i

denote the

N×

1 vector of period-i returns. The standard covariance estimator is “realized covariance”

XM i=1

r_ir_i⁰

I

In practice there is non-synchronous trading induced by the random arrivals of trades / quotes

I

RC requires returns to be sampled on a regular grid

I

Sampling induces cross-asset serial correlation and makes RC

biased

(5)

Non-synchronous trading

1 sec0 1 min 2 min 3 min 4 min

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4

GE/IBM (all exchanges)

(6)

Covariance estimation

I

To counter the Epps effect, we can use the irregular data and compute a Hayashi-Yoshida type estimator,

M1

X

i=1

X

j|(tj−1,tj)∩(ti−1,ti)6=∅

(p

_t⁽¹⁾_i −p_t⁽¹⁾_i−1

)(p

_t⁽²⁾_j −p_t⁽²⁾_j−1

)

I HY does not guarantee psd covariance matrix

I Griffin and Oomen (2006) find that HY is downward biased due to

“sluggish” price adjustments

I

Alternatively, we can include a lead-lag adjustment

XM

i=1

r_ir_i⁰

+

M−1X

i=2

(r

_i−1r_i⁰

+

r_i+1r_i⁰

) +

. . .

Barndorff-Nielsen et al. (2008) study this estimator with kernel

weighting

(7)

Non-synchronous trading & refresh time sampling

I

BNHLS use refresh time sampling scheme to force limited dependence due to non-sychronicity and then apply realised kernels

(see also Harris, McInish, Shoesmith, Wood 1995, Martens 2003)

5 10 15 20

observation

Drawbacks of this method:

1. inefficient use of data, particularly when mixing liquid and illiquid assets 2. universe dependent

covariance estimates

(8)

Outline

1. Introduction / Motivation

2. The Mixed-Frequency Factor Model 3. Simulation Study

4. Empirical Study

(9)

The Mixed-Frequency Factor Model

I

Our aim here is to estimate a covariance matrix that

I uses the information contained in HFD

I avoids Epps effect-like biases due to non-synchronous trading

I “works” in a very high dimensional setting

I

The main idea we develop in this paper is as follows

1. specify a linear factor structure for asset returns

2. use “liquid factors” to estimate the factor covariance matrix 3. use daily returns to estimate the factor loadings

(10)

The Linear Model

I

Formally, consider a linear factor structure for the cross-section of asset returns:

(N×1)r

=

β

(N×K)

(Kf×1)

+

ε

(N×1)

I cov(ε) assumed to be diagonal

I typicallyK <<N

I factors may include sector, industry, style, country factors

I

The covariance is then equal to:

cov(r) =βΛβ⁰

+ Σ

where Λ =

E

(ff

⁰

) and Σ =

E(εε⁰

)

(11)

Theorem

Letγij =βbiΛbβbj denote the (i,j) entry of the estimated covariance matrix from the linear factor model. Assuming (i) E(σij) = 0, (ii) E(β^ε) = 0, (iii) E(Λ^ε) = 0, and (iv)β^ε⊥Λ^εelement-by-element, then

E(bγ_ij) =γ_ij and

V(bγij) = β_i⁰ΛΣβ,jΛ⁰βi+β⁰_jΛ⁰Σβ,iΛβj+tr(Σβ,iΛΣβ,jΛ⁰) +g(βiβ_i⁰, βjβ_j⁰,Φ) +g(βiβ⁰_i,Σβ,j,Φ) +g(βjβ⁰_j,Σβ,i,Φ) +g(Σβ,i,Σβ,j,Φ)

whereΣβ,i=V(βbi)andΦ =E(vech(Λ^ε)vech(Λ^ε)⁰)and

g(A,B,Φ) = XN m,n,p,q

AmpBnqΦf(p,n),f(q,m)

and f(p,q) =N(min{p,q} −1) +¹₂(min{p,q} −min{p,q}²) + max{p,q}.

(12)

Mixed-Frequency Factor Model

I

We now specialize this result to a specific setting of interest

I T “low frequency” observations available on both factors and individual assets

I M “high frequency” observations available only on factors

I factor model is correctly specified and parameters are constant in time

I factor covariance is estimated using “realized covariance”

I factor loadings are estimated using OLS

(13)

Mixed-Frequency Factor Model

I

With the above assumptions, we have unbiased covariance estimates with

V

(b

γ_ij

) =

A

T

+

B

+

C

M ,

whereA=σ²_jβ_i⁰Λβ_i+σ_i²β⁰_jΛβ_j+σ²_iσ²_j^K_T,B=P_K

m,n,p,qβ_i(m)β_i(p)β_j(n)β_j(q)(ΛpqΛnm+ ΛpmΛnq), C=P_K

m,n,p,q(β_i(m)β_i(p)Σ_β,j(n,q) +β_j(m)β_j(p)Σ_β,i(n,q) + Σ_β,i(m,p)Σ_β,j(n,q))(Λ_pqΛ_nm+ Λ_pmΛ_nq)

I

We can now distinguish among three different scenarios:

1. factor loadings are known (“T → ∞”)

2. factor covariance matrix is known (“M→ ∞”)

3. both are estimated⇒the mixed-frequency factor model

(14)

Numerical illustration : MFFM vs HY

I

Approach 1 : the mixed-frequency factor model

I 5 factors (K = 5)

I specific risk account for about 50%

I correlation between assets about 40%

I estimate factor loadings usingT ={250; 500}days of data

I estimate factor covariance usingM={78; 156}intra-day observations

I

Approach 2 : the pairwise Hayashi and Yoshida (2005) estimator

I usingN intra-daynon-synchronousobservations for both assets

I

Compare in terms of MSE

(15)

Numerical illustration : MFFM vs HY

200 400 600 800 1000 1200 1400

−4.5

−4

−3.5

−3

−2.5

−2

−1.5

−1

−0.5

number of intra−day asset return observations N

ln MSE

MFFM (T = 250, M = 78) MFFM (T = 500, M = 78) MFFM (T = 250, M = 156) HY (no noise)

HY (intermediate noise) HY (high noise)

(16)

Outline

1. Introduction / Motivation

2. The Mixed-Frequency Factor Model 3. Simulation Study

4. Empirical Study

(17)

Simulation setup

I

study performance of MFFM in realistic high dimensional setting with non-synchronous trading and micro structure noise

1. liquid factors

f

I using (estimated) covariance Λ of three Fama French factors

I Poisson sampling with arrival rate such thatE(M) = 25,000

2. factor loadings

β

I use point estimates for S&P 500 universe

I add measurement error to obtainβb

3. stock-specific innovations

ε

I use point estimates of “beta-regression” residual variances diag(Σ)

(18)

Simulation setup

I

On a fine grid, we then simulate returns

r

=

βf

+

ε

+ “MA(1)noise

⁰⁰

and use Poisson sampling with rates calibrated to S&P500

I

MFFM is computed as

βbb

Λ

βb⁰

+ Σ where

b

Λ =

b ff⁰

and Σ = (r

b −βf

)(r

−βf

)

⁰

I

To set level of noise, we calibrate the “noise ratio” of Oomen (2006)

γ²

=

ω² IV/M

(19)

Simulation setup

(Table from Christensen, Oomen, Podolskij, 2009)

# of observationsM noise ratioγ

universe Q5 Q50 Q95 Q5 Q50 Q95

Panel A: US

S&P600 157 751 2,417 0.10 0.34 0.73

S&P400 604 1,749 4,710 0.12 0.36 0.76

S&P500 1,477 4,174 12,355 0.14 0.37 0.93

S&P100 2,945 7,338 20,707 0.17 0.40 1.06

DJ30 4,701 9,562 23,686 0.22 0.45 0.97

Panel B: Europe

DJ Stoxx Small 200 158 772 2,225 0.25 0.59 1.15

DJ Stoxx Mid 200 352 1,419 3,689 0.30 0.63 1.16

DJ Stoxx Large 200 999 3,634 11,169 0.34 0.66 1.28

DJ Stoxx50 3,161 6,975 15,860 0.40 0.71 1.40

Panel C: Asia-pacific

S&P ASX200 199 744 2,957 0.30 0.74 1.75

S&P Topix 150 370 1,070 2,639 0.34 1.03 3.59

Hang Seng 465 1,260 4,090 0.39 0.88 2.26

Panel D: Emerging markets (BRIC)

Ibovespa (Brazil) 261 1,130 5,617 0.32 0.66 1.21

DJ Titans 10 (Russia) 543 6,066 22,230 0.58 1.03 1.29

DJ BRIC 50 (India) 726 2,098 4,987 0.16 0.49 0.90

DJ BRIC 50 (China) 1,177 2,328 5,197 0.60 1.17 2.96

(20)

Simulation setup

1. compute RC, MFFM, and compare to true covariance matrix

βΛβ⁰

+ Σ

2. Frobenius norm diagonal and off-diagonal elements separately

kVCV −VCV[k₂

3. Bias measure of diagonal and off-diagonal elements separately

ι⁰VCV[ι

ι⁰VCVι

(21)

Simulation results

High Liquidity & No Noise

Co-variance Bias Co-variance Frobenius Norm

1 sec0 5 sec 15 sec 30 sec 1 min 2 min 5 min 15 min 30 min

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

sampling frequency (M) RC

MFFM (T = 10 years) MFFM (T = 1 year)

1 2 3 4 5 6 7 8 9 10

I Strong Epps effect for RC, MFFM largely unbiased across frequencies

I MFFM superior and insensitive to choice of sampling frequency

(22)

Simulation results

Medium Liquidity & No Noise

Co-variance Bias Co-variance Frobenius Norm

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

1 2 3 4 5 6 7 8 9 10

I With less “liquid” assets, RC performance deteriorates quickly

I MFFM unaffected (only through diagonal elements)

(23)

Simulation results

Low Liquidity & No Noise

Co-variance Bias Co-variance Frobenius Norm

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

1 2 3 4 5 6 7 8 9 10

I Epps effect in RC strongest for illiquid assets

(24)

Simulation results

High Liquidity & Medium Noise

Co-variance Bias Co-variance Frobenius Norm

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

1 2 3 4 5 6 7 8 9 10

I Noise does not have much impact as it is (assumed) cross-sectionally independent

(25)

Simulation results

High Liquidity & No Noise

Variance Bias Variance Frobenius Norm

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

0.2 0.4 0.6 0.8 1 1.2

I Variance estimates of RC (i.e. RV) are superior to MFFM

(26)

Simulation results

High Liquidity & Medium Noise

Variance Bias Variance Frobenius Norm

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

0.2 0.4 0.6 0.8 1 1.2

I With noise, RC variance estimates deteriorate, MFFM also affected through estimation of specific risk

(27)

Outline

1. Introduction / Motivation

2. The Mixed-Frequency Factor Model 3. Simulation Study

4. Empirical Study

(28)

Empirical study

I

Study performance of MFFM compared to RC in large dimensional setting

I large cap S&P 500 universe

I mid cap S&P 400 universe

I small cap S&P 600 universe

to our knowledge the first study to calculate covariance matrices with HFD of this size

I

We use liquid ETFs as factors

I 10 industry factors

I 1 SMB factor (IWM - SPY)

I 1 HML factor (IVE - IVW)

(29)

Choice of “liquid factors”

sector / style # trades

ticker description classification per day

XLE.A Energy Sector SPDR Fund Energy 64,110

XLB.A Materials Sector SPDR Fund Materials 22,423

XLI.A Industrial Sector SPDR Fund Industrials 12,235

XLY.A Consumer Discretionary Sector SPDR Fund Consumer Disc. 11,198 XLP.A Consumer Staples Sector SPDR Fund Consumer Staples 5,550

XLV.A Health Care Sector SPDR Fund Health Care 6,353

XLF.A Financial Sector SPDR Fund Financials 146,853

XLK.A Technology Sector SPDR Fund Information Tech. 9,245 IYZ.N iShares Telecommunications Sector Fund Telecommunications 930

XLU.A Utilities Sector SPDR Fund Utilities 11,544

SPY.A SPDR Trust Series 1 Large Cap 300,104

IWM.A iShares Russell 2000 Index Fund Small Cap 163,148

IVE.N S&P 500 Value Index Fund Value 3,201

IVW.N S&P 500 Growth Index Fund Growth 4,526

Average across ETFs 54,387

Average across S&P500 constituents 8,272

(30)

Empirical study

I

compute RC and MFFM over a range of sampling frequencies between 15 seconds and 15 minutes

I

betas for MFFM are computed using daily data over 2.5 year rolling window

I

forecast using simple EWMA scheme with persistence exp(−α), varying

α∈

(0, 1)

I

measure performance by tracking error relative to market factor over period Jan 07 - May 08

I

Motivated by DeMiguel Garlappi Uppal (2007), we also compute

the equally weighted “1/N” portfolio

(31)

Tracking Error S&P 500 (large-cap) stocks

RC MFFM

α 15s 1m 5m 15m 15s 1m 5m 15m

0.010 0.085 0.075 0.072 0.073 0.040 0.040 0.040 0.040 0.025 0.078 0.071 0.071 0.072 0.040 0.040 0.040 0.040 0.050 0.070 0.066 0.067 0.073 0.040 0.040 0.040 0.040 0.075 0.065 0.063 0.065 0.074 0.040 0.040 0.040 0.040 0.100 0.062 0.061 0.064 0.076 0.040 0.040 0.040 0.040 0.150 0.059 0.058 0.064 0.082 0.040 0.040 0.040 0.040 0.250 0.055 0.056 0.064 0.099 0.040 0.040 0.040 0.040 0.400 0.053 0.056 0.067 0.123 0.040 0.040 0.040 0.040 0.750 0.053 0.058 0.079 0.164 0.040 0.040 0.040 0.040 1.000 0.054 0.060 0.089 0.189 0.040 0.040 0.040 0.040

Equally weighted 1/N portfolio:

0.050

(32)

Tracking Error S&P 400 (mid-cap) stocks

RC MFFM

α 15s 1m 5m 15m 15s 1m 5m 15m

0.010 0.091 0.086 0.081 0.083 0.051 0.051 0.051 0.052 0.025 0.090 0.086 0.083 0.088 0.051 0.051 0.051 0.051 0.050 0.086 0.082 0.083 0.094 0.051 0.051 0.051 0.051 0.075 0.082 0.080 0.082 0.099 0.051 0.051 0.051 0.051 0.100 0.079 0.078 0.082 0.102 0.051 0.051 0.051 0.051 0.150 0.074 0.077 0.082 0.108 0.051 0.051 0.051 0.051 0.250 0.071 0.077 0.085 0.118 0.051 0.051 0.051 0.051 0.400 0.070 0.079 0.090 0.134 0.051 0.051 0.050 0.051 0.750 0.072 0.085 0.105 0.176 0.051 0.051 0.051 0.051 1.000 0.073 0.089 0.114 0.204 0.051 0.051 0.051 0.051

Equally weighted 1/N portfolio:

0.059

(33)

Tracking Error S&P 600 (small-cap) stocks

RC MFFM

α 15s 1m 5m 15m 15s 1m 5m 15m

0.010 0.096 0.095 0.095 0.102 0.064 0.064 0.063 0.063 0.025 0.095 0.095 0.096 0.107 0.063 0.063 0.063 0.063 0.050 0.091 0.091 0.093 0.112 0.063 0.063 0.062 0.062 0.075 0.086 0.088 0.092 0.116 0.063 0.063 0.062 0.062 0.100 0.083 0.087 0.093 0.121 0.063 0.063 0.062 0.062 0.150 0.081 0.086 0.097 0.136 0.063 0.063 0.062 0.062 0.250 0.082 0.086 0.108 0.165 0.063 0.063 0.062 0.062 0.400 0.084 0.089 0.123 0.205 0.063 0.063 0.062 0.062 0.750 0.091 0.100 0.163 0.278 0.063 0.063 0.062 0.062 1.000 0.096 0.110 0.188 0.321 0.063 0.063 0.062 0.062

Equally weighted 1/N portfolio:

0.065

(34)

Conclusion

I

We introduce and study the MFFM model and show it has nice properties

I allows for use of HFD in large dimensional setting

I delivers positive definite well conditioned covariance matrix (due to factor structure)

I can be superior to pair-wise HY or RC estimator

I

The empirical study – to our knowledge the first to consider vast dimensional matrices with HFD – further illustrates the appeal of the MFFM model

Thanks for your attention!

We are always looking for interns with strong statistics &

micro-structure background