Realized Factor Models
for Vast Dimensional Covariance Estimation
CMAP, ´Ecole Polytechnique Paris, 23 March 2009
Roel Oomen
(joint with Karim Bannouh, Martin Martens, and Dick van Dijk)
Motivation
I
High frequency data has spurred development of “realized”
measures
I realized variancedefined as sum of squared intra-period returns
(Andersen and Bollerslev, 1998; Barndorff-Nielsen and Shephard, 2002)
I consistent and efficient measure of quadratic variation
I
In univariate setting, there are some complications
1. micro-structure noise: sub-sampling, kernels, two/multi-scale
(Zhang, Mykland, and A¨ıt-Sahalia, 2005, Barndorff-Nielsen, Hansen, Lunde, and Shephard, 2008; Zhou, 1996; Jacod, Li, Mykland, Podolskij, and Vetter, 2007; Podolskij and Vetter, 2008)
2. jumps: bi-power variation, Med RV, threshold RV, quantile RV
(Barndorff-Nielsen and Shephard, 2004; Jacod, 2008; Mancini, 2004, 2006, Andersen, Dobrev, and Schaumburg, 2008, Christensen, Oomen, Podolskij, 2008)
3. outliers: quantile RV, threshold RV
(Christensen et al, Mancini, Jacod)
Motivation
I
In a multivariate setting there are additional considerations
I non-synchronous tradingand its resulting bias, i.e. the Epps effect
(Epps, 1979; Lo and MacKinlay, 1990)
I dimensionalityandstabilityof the covariance matrix
(Chan, Karceski, and Lakonishok, 1999; Fan, Fan, and Lv, 2008, Jagannathan and Ma, 2003)
I
The contribution this paper makes is as follows:
1. based on a linear factor model, we propose a simple way to estimate vast dimensional covariance matrices using high- and low-frequency datasimultaneously
2. we illustrate its use on a very large asset universe, rarely studied in the academic literature
3. theoretical and empirical results highlight the favorable properties of our approach
Covariance estimation
I
Let
ridenote the
N×1 vector of period-i returns. The standard covariance estimator is “realized covariance”
XM i=1
riri0
I
In practice there is non-synchronous trading induced by the random arrivals of trades / quotes
I
RC requires returns to be sampled on a regular grid
I
Sampling induces cross-asset serial correlation and makes RC
biased
Non-synchronous trading
1 sec0 1 min 2 min 3 min 4 min
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4
GE/IBM (all exchanges)
Covariance estimation
I
To counter the Epps effect, we can use the irregular data and compute a Hayashi-Yoshida type estimator,
M1
X
i=1
X
j|(tj−1,tj)∩(ti−1,ti)6=∅
(p
t(1)i −pt(1)i−1)(p
t(2)j −pt(2)j−1)
I HY does not guarantee psd covariance matrix
I Griffin and Oomen (2006) find that HY is downward biased due to
“sluggish” price adjustments
I
Alternatively, we can include a lead-lag adjustment
XMi=1
riri0
+
M−1X
i=2
(r
i−1ri0+
ri+1ri0) +
. . .Barndorff-Nielsen et al. (2008) study this estimator with kernel
weighting
Non-synchronous trading & refresh time sampling
I
BNHLS use refresh time sampling scheme to force limited dependence due to non-sychronicity and then apply realised kernels
(see also Harris, McInish, Shoesmith, Wood 1995, Martens 2003)
5 10 15 20
observation
Drawbacks of this method:
1. inefficient use of data, particularly when mixing liquid and illiquid assets 2. universe dependent
covariance estimates
Outline
1. Introduction / Motivation
2. The Mixed-Frequency Factor Model 3. Simulation Study
4. Empirical Study
The Mixed-Frequency Factor Model
I
Our aim here is to estimate a covariance matrix that
I uses the information contained in HFD
I avoids Epps effect-like biases due to non-synchronous trading
I “works” in a very high dimensional setting
I
The main idea we develop in this paper is as follows
1. specify a linear factor structure for asset returns
2. use “liquid factors” to estimate the factor covariance matrix 3. use daily returns to estimate the factor loadings
The Linear Model
I
Formally, consider a linear factor structure for the cross-section of asset returns:
(N×1)r
=
β(N×K)
(Kf×1)
+
ε(N×1)
I cov(ε) assumed to be diagonal
I typicallyK <<N
I factors may include sector, industry, style, country factors
I
The covariance is then equal to:
cov(r) =βΛβ0
+ Σ
where Λ =
E(ff
0) and Σ =
E(εε0)
Theorem
Letγij =βbiΛbβbj denote the (i,j) entry of the estimated covariance matrix from the linear factor model. Assuming (i) E(σij) = 0, (ii) E(βε) = 0, (iii) E(Λε) = 0, and (iv)βε⊥Λεelement-by-element, then
E(bγij) =γij and
V(bγij) = βi0ΛΣβ,jΛ0βi+β0jΛ0Σβ,iΛβj+tr(Σβ,iΛΣβ,jΛ0) +g(βiβi0, βjβj0,Φ) +g(βiβ0i,Σβ,j,Φ) +g(βjβ0j,Σβ,i,Φ) +g(Σβ,i,Σβ,j,Φ)
whereΣβ,i=V(βbi)andΦ =E(vech(Λε)vech(Λε)0)and
g(A,B,Φ) = XN m,n,p,q
AmpBnqΦf(p,n),f(q,m)
and f(p,q) =N(min{p,q} −1) +12(min{p,q} −min{p,q}2) + max{p,q}.
Mixed-Frequency Factor Model
I
We now specialize this result to a specific setting of interest
I T “low frequency” observations available on both factors and individual assets
I M “high frequency” observations available only on factors
I factor model is correctly specified and parameters are constant in time
I factor covariance is estimated using “realized covariance”
I factor loadings are estimated using OLS
Mixed-Frequency Factor Model
I
With the above assumptions, we have unbiased covariance estimates with
V
(b
γij) =
AT
+
B+
CM ,
whereA=σ2jβi0Λβi+σi2β0jΛβj+σ2iσ2jKT,B=PK
m,n,p,qβi(m)βi(p)βj(n)βj(q)(ΛpqΛnm+ ΛpmΛnq), C=PK
m,n,p,q(βi(m)βi(p)Σβ,j(n,q) +βj(m)βj(p)Σβ,i(n,q) + Σβ,i(m,p)Σβ,j(n,q))(ΛpqΛnm+ ΛpmΛnq)
I
We can now distinguish among three different scenarios:
1. factor loadings are known (“T → ∞”)
2. factor covariance matrix is known (“M→ ∞”)
3. both are estimated⇒the mixed-frequency factor model
Numerical illustration : MFFM vs HY
I
Approach 1 : the mixed-frequency factor model
I 5 factors (K = 5)
I specific risk account for about 50%
I correlation between assets about 40%
I estimate factor loadings usingT ={250; 500}days of data
I estimate factor covariance usingM={78; 156}intra-day observations
I
Approach 2 : the pairwise Hayashi and Yoshida (2005) estimator
I usingN intra-daynon-synchronousobservations for both assets
I
Compare in terms of MSE
Numerical illustration : MFFM vs HY
200 400 600 800 1000 1200 1400
−4.5
−4
−3.5
−3
−2.5
−2
−1.5
−1
−0.5
number of intra−day asset return observations N
ln MSE
MFFM (T = 250, M = 78) MFFM (T = 500, M = 78) MFFM (T = 250, M = 156) HY (no noise)
HY (intermediate noise) HY (high noise)
Outline
1. Introduction / Motivation
2. The Mixed-Frequency Factor Model 3. Simulation Study
4. Empirical Study
Simulation setup
I
study performance of MFFM in realistic high dimensional setting with non-synchronous trading and micro structure noise
1. liquid factors
fI using (estimated) covariance Λ of three Fama French factors
I Poisson sampling with arrival rate such thatE(M) = 25,000
2. factor loadings
βI use point estimates for S&P 500 universe
I add measurement error to obtainβb
3. stock-specific innovations
εI use point estimates of “beta-regression” residual variances diag(Σ)
Simulation setup
I
On a fine grid, we then simulate returns
r
=
βf+
ε+ “MA(1)noise
00and use Poisson sampling with rates calibrated to S&P500
I
MFFM is computed as
βbbΛ
βb0+ Σ where
bΛ =
b ff0and Σ = (r
b −βf)(r
−βf)
0I
To set level of noise, we calibrate the “noise ratio” of Oomen (2006)
γ2
=
ω2 IV/MSimulation setup
(Table from Christensen, Oomen, Podolskij, 2009)# of observationsM noise ratioγ
universe Q5 Q50 Q95 Q5 Q50 Q95
Panel A: US
S&P600 157 751 2,417 0.10 0.34 0.73
S&P400 604 1,749 4,710 0.12 0.36 0.76
S&P500 1,477 4,174 12,355 0.14 0.37 0.93
S&P100 2,945 7,338 20,707 0.17 0.40 1.06
DJ30 4,701 9,562 23,686 0.22 0.45 0.97
Panel B: Europe
DJ Stoxx Small 200 158 772 2,225 0.25 0.59 1.15
DJ Stoxx Mid 200 352 1,419 3,689 0.30 0.63 1.16
DJ Stoxx Large 200 999 3,634 11,169 0.34 0.66 1.28
DJ Stoxx50 3,161 6,975 15,860 0.40 0.71 1.40
Panel C: Asia-pacific
S&P ASX200 199 744 2,957 0.30 0.74 1.75
S&P Topix 150 370 1,070 2,639 0.34 1.03 3.59
Hang Seng 465 1,260 4,090 0.39 0.88 2.26
Panel D: Emerging markets (BRIC)
Ibovespa (Brazil) 261 1,130 5,617 0.32 0.66 1.21
DJ Titans 10 (Russia) 543 6,066 22,230 0.58 1.03 1.29
DJ BRIC 50 (India) 726 2,098 4,987 0.16 0.49 0.90
DJ BRIC 50 (China) 1,177 2,328 5,197 0.60 1.17 2.96
Simulation setup
1. compute RC, MFFM, and compare to true covariance matrix
βΛβ0+ Σ
2. Frobenius norm diagonal and off-diagonal elements separately
kVCV −VCV[k23. Bias measure of diagonal and off-diagonal elements separately
ι0VCV[ιι0VCVι
Simulation results
High Liquidity & No Noise
Co-variance Bias Co-variance Frobenius Norm
1 sec0 5 sec 15 sec 30 sec 1 min 2 min 5 min 15 min 30 min
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
sampling frequency (M) RC
MFFM (T = 10 years) MFFM (T = 1 year)
1 sec0 5 sec 15 sec 30 sec 1 min 2 min 5 min 15 min 30 min
1 2 3 4 5 6 7 8 9 10
sampling frequency (M) RC
MFFM (T = 10 years) MFFM (T = 1 year)
I Strong Epps effect for RC, MFFM largely unbiased across frequencies
I MFFM superior and insensitive to choice of sampling frequency
Simulation results
Medium Liquidity & No Noise
Co-variance Bias Co-variance Frobenius Norm
1 sec0 5 sec 15 sec 30 sec 1 min 2 min 5 min 15 min 30 min
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
sampling frequency (M) RC
MFFM (T = 10 years) MFFM (T = 1 year)
1 sec0 5 sec 15 sec 30 sec 1 min 2 min 5 min 15 min 30 min
1 2 3 4 5 6 7 8 9 10
sampling frequency (M) RC
MFFM (T = 10 years) MFFM (T = 1 year)
I With less “liquid” assets, RC performance deteriorates quickly
I MFFM unaffected (only through diagonal elements)
Simulation results
Low Liquidity & No Noise
Co-variance Bias Co-variance Frobenius Norm
1 sec0 5 sec 15 sec 30 sec 1 min 2 min 5 min 15 min 30 min
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
sampling frequency (M) RC
MFFM (T = 10 years) MFFM (T = 1 year)
1 sec0 5 sec 15 sec 30 sec 1 min 2 min 5 min 15 min 30 min
1 2 3 4 5 6 7 8 9 10
sampling frequency (M) RC
MFFM (T = 10 years) MFFM (T = 1 year)
I Epps effect in RC strongest for illiquid assets
Simulation results
High Liquidity & Medium Noise
Co-variance Bias Co-variance Frobenius Norm
1 sec0 5 sec 15 sec 30 sec 1 min 2 min 5 min 15 min 30 min
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
sampling frequency (M) RC
MFFM (T = 10 years) MFFM (T = 1 year)
1 sec0 5 sec 15 sec 30 sec 1 min 2 min 5 min 15 min 30 min
1 2 3 4 5 6 7 8 9 10
sampling frequency (M) RC
MFFM (T = 10 years) MFFM (T = 1 year)
I Noise does not have much impact as it is (assumed) cross-sectionally independent
Simulation results
High Liquidity & No Noise
Variance Bias Variance Frobenius Norm
1 sec0 5 sec 15 sec 30 sec 1 min 2 min 5 min 15 min 30 min
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
sampling frequency (M) RC
MFFM (T = 10 years) MFFM (T = 1 year)
1 sec0 5 sec 15 sec 30 sec 1 min 2 min 5 min 15 min 30 min
0.2 0.4 0.6 0.8 1 1.2
sampling frequency (M) RC
MFFM (T = 10 years) MFFM (T = 1 year)
I Variance estimates of RC (i.e. RV) are superior to MFFM
Simulation results
High Liquidity & Medium Noise
Variance Bias Variance Frobenius Norm
1 sec0 5 sec 15 sec 30 sec 1 min 2 min 5 min 15 min 30 min
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
sampling frequency (M) RC
MFFM (T = 10 years) MFFM (T = 1 year)
1 sec0 5 sec 15 sec 30 sec 1 min 2 min 5 min 15 min 30 min
0.2 0.4 0.6 0.8 1 1.2
sampling frequency (M) RC
MFFM (T = 10 years) MFFM (T = 1 year)
I With noise, RC variance estimates deteriorate, MFFM also affected through estimation of specific risk
Outline
1. Introduction / Motivation
2. The Mixed-Frequency Factor Model 3. Simulation Study
4. Empirical Study
Empirical study
I
Study performance of MFFM compared to RC in large dimensional setting
I large cap S&P 500 universe
I mid cap S&P 400 universe
I small cap S&P 600 universe
to our knowledge the first study to calculate covariance matrices with HFD of this size
I
We use liquid ETFs as factors
I 10 industry factors
I 1 SMB factor (IWM - SPY)
I 1 HML factor (IVE - IVW)
Choice of “liquid factors”
sector / style # trades
ticker description classification per day
XLE.A Energy Sector SPDR Fund Energy 64,110
XLB.A Materials Sector SPDR Fund Materials 22,423
XLI.A Industrial Sector SPDR Fund Industrials 12,235
XLY.A Consumer Discretionary Sector SPDR Fund Consumer Disc. 11,198 XLP.A Consumer Staples Sector SPDR Fund Consumer Staples 5,550
XLV.A Health Care Sector SPDR Fund Health Care 6,353
XLF.A Financial Sector SPDR Fund Financials 146,853
XLK.A Technology Sector SPDR Fund Information Tech. 9,245 IYZ.N iShares Telecommunications Sector Fund Telecommunications 930
XLU.A Utilities Sector SPDR Fund Utilities 11,544
SPY.A SPDR Trust Series 1 Large Cap 300,104
IWM.A iShares Russell 2000 Index Fund Small Cap 163,148
IVE.N S&P 500 Value Index Fund Value 3,201
IVW.N S&P 500 Growth Index Fund Growth 4,526
Average across ETFs 54,387
Average across S&P500 constituents 8,272
Average across S&P400 constituents 2,912
Average across S&P600 constituents 1,411
Empirical study
I
compute RC and MFFM over a range of sampling frequencies between 15 seconds and 15 minutes
I
betas for MFFM are computed using daily data over 2.5 year rolling window
I
forecast using simple EWMA scheme with persistence exp(−α), varying
α∈(0, 1)
I
measure performance by tracking error relative to market factor over period Jan 07 - May 08
I
Motivated by DeMiguel Garlappi Uppal (2007), we also compute
the equally weighted “1/N” portfolio
Tracking Error S&P 500 (large-cap) stocks
RC MFFM
α 15s 1m 5m 15m 15s 1m 5m 15m
0.010 0.085 0.075 0.072 0.073 0.040 0.040 0.040 0.040 0.025 0.078 0.071 0.071 0.072 0.040 0.040 0.040 0.040 0.050 0.070 0.066 0.067 0.073 0.040 0.040 0.040 0.040 0.075 0.065 0.063 0.065 0.074 0.040 0.040 0.040 0.040 0.100 0.062 0.061 0.064 0.076 0.040 0.040 0.040 0.040 0.150 0.059 0.058 0.064 0.082 0.040 0.040 0.040 0.040 0.250 0.055 0.056 0.064 0.099 0.040 0.040 0.040 0.040 0.400 0.053 0.056 0.067 0.123 0.040 0.040 0.040 0.040 0.750 0.053 0.058 0.079 0.164 0.040 0.040 0.040 0.040 1.000 0.054 0.060 0.089 0.189 0.040 0.040 0.040 0.040
Equally weighted 1/N portfolio:
0.050Tracking Error S&P 400 (mid-cap) stocks
RC MFFM
α 15s 1m 5m 15m 15s 1m 5m 15m
0.010 0.091 0.086 0.081 0.083 0.051 0.051 0.051 0.052 0.025 0.090 0.086 0.083 0.088 0.051 0.051 0.051 0.051 0.050 0.086 0.082 0.083 0.094 0.051 0.051 0.051 0.051 0.075 0.082 0.080 0.082 0.099 0.051 0.051 0.051 0.051 0.100 0.079 0.078 0.082 0.102 0.051 0.051 0.051 0.051 0.150 0.074 0.077 0.082 0.108 0.051 0.051 0.051 0.051 0.250 0.071 0.077 0.085 0.118 0.051 0.051 0.051 0.051 0.400 0.070 0.079 0.090 0.134 0.051 0.051 0.050 0.051 0.750 0.072 0.085 0.105 0.176 0.051 0.051 0.051 0.051 1.000 0.073 0.089 0.114 0.204 0.051 0.051 0.051 0.051
Equally weighted 1/N portfolio:
0.059Tracking Error S&P 600 (small-cap) stocks
RC MFFM
α 15s 1m 5m 15m 15s 1m 5m 15m
0.010 0.096 0.095 0.095 0.102 0.064 0.064 0.063 0.063 0.025 0.095 0.095 0.096 0.107 0.063 0.063 0.063 0.063 0.050 0.091 0.091 0.093 0.112 0.063 0.063 0.062 0.062 0.075 0.086 0.088 0.092 0.116 0.063 0.063 0.062 0.062 0.100 0.083 0.087 0.093 0.121 0.063 0.063 0.062 0.062 0.150 0.081 0.086 0.097 0.136 0.063 0.063 0.062 0.062 0.250 0.082 0.086 0.108 0.165 0.063 0.063 0.062 0.062 0.400 0.084 0.089 0.123 0.205 0.063 0.063 0.062 0.062 0.750 0.091 0.100 0.163 0.278 0.063 0.063 0.062 0.062 1.000 0.096 0.110 0.188 0.321 0.063 0.063 0.062 0.062
Equally weighted 1/N portfolio:
0.065Conclusion
I
We introduce and study the MFFM model and show it has nice properties
I allows for use of HFD in large dimensional setting
I delivers positive definite well conditioned covariance matrix (due to factor structure)
I can be superior to pair-wise HY or RC estimator
I
The empirical study – to our knowledge the first to consider vast dimensional matrices with HFD – further illustrates the appeal of the MFFM model
Thanks for your attention!
We are always looking for interns with strong statistics &
micro-structure background