in Asset Pricing
Carlo Sala
Submitted for the degree of Ph.D. in Economics Swiss Finance Institute
Universit´a della Svizzera Italiana (USI), Switzerland
Advisor:
Prof. Barone Adesi Giovanni.
Swiss Finance Institute at USIis that of the author alone; the work has not been submitted previously, in whole or in part, to qualify for any other academic award; and the content of the thesis is the result of work which has been carried out since the official commencement date of the approved research program.
Carlo Sala Lugano, June 1, 2016
These graduating years have been very busy, exciting and rewarding. I would love to thank first and foremost my advisor, Prof. Barone Adesi for supporting and helping me since the 2009. Thank you for all your attention and assistance, I have thoroughly enjoyed and benefited from being a student of yours. A very special thank also goes to Prof. Mira for our inspiring and insightful initial discussions on how to estimate the pricing kernel from a Bayesian prospective and for her support during the past four years, and to Prof. Mancini for helping me with the MatLab code (this really helped me a lot!) and with his suggestions during the PhD and the job market. I also thank the members of my PhD committee, Professors Mele and Platen, for their helpful career advice and for their great suggestions about my thesis. Last but not least, I want to thank all professors and colleagues of the institute of finance. I’ve learned from you every day.
“Everybody is a genius. But if you judge a fish by its ability to climb a tree, it will live its whole life believing that it is stupid.”
Table of Contents . . . vi
List of Tables vii List of Tables . . . vii
List of Figures viii List of Figures . . . ix
1 Extended Abstract 1 2 The Measure’s Problem, the Pricing Kernel and its Puzzles 7 2.1 The physical and the risk-neutral measures in finance . . . 8
2.1.1 The pricing kernel . . . 11
2.2 Introduction to the pricing kernel puzzle . . . 12
2.2.1 More on the measures’ problem . . . 14
2.3 Joining the measures . . . 19
3 Theoretical Set Up And The Literature So Far 22 3.1 The empirical pricing kernel (EPK) . . . 23
3.1.1 PK as the conditional Radon-Nikodym derivative . . . 24
3.1.2 The Riesz representation theorem . . . 27
3.1.3 Macroeconomic derivation . . . 28
3.1.4 No-arbitrage derivation . . . 31
3.2 The investor’s risk aversion . . . 33
3.3 The literature so far . . . 35
4 Conditioning for the information in asset pricing 38
4.1 Information and market efficiency . . . 39
4.2 The importance of the Information in Asset Pricing . . . 43
4.2.1 Econometric problems behind the estimation of the real-world measure . . . 46
4.2.2 Defining the financial pricing kernel as a Radon-Nikodym derivative . . . 50
4.3 The behaviours of local and strict local martingales in finance . . . 52
4.3.1 Projecting the Radon-Nikodym derivative onto a smaller filtration set . . . . 54
4.3.2 Filtration shrinkage and information inaccessibility for strict local martingales 57 4.4 Empirical Application . . . 59
4.4.1 Extending the findings to the risk-neutral measure . . . 61
4.5 Fully and partially dynamic models, a comparison . . . 63
4.5.1 A conditional version of the basic consumption based model . . . 64
4.6 From consumption to portfolio optimization problems, the conditional case . . . 67
4.6.1 Presenting the problem . . . 67
4.6.2 The L´evy-Itˆo model and the portfolio optimization problem . . . 69
4.6.3 The theoretical optimal choice . . . 73
4.6.4 The real-world optimal choice . . . 75
4.6.5 The power utility case . . . 79
4.6.6 Connection with the PK . . . 81
4.6.7 Expressing the information premium as the Kullback-Leibler divergence . . . 83
4.7 Future ideas . . . 87
5 The Methodology - part I 89 5.1 GJR GARCH-FHS and Monte Carlo simulation . . . 90
5.1.1 The estimation of the physical parameters . . . 90
5.1.2 The estimation of the risk-neutral parameters . . . 92
5.2 The main limitation: the incompleteness of the physical measure . . . 94
6 The Bayesian approach 95 6.1 Why a Bayesian non-parametric proposal? . . . 96
6.2 The Dirichlet Process . . . 97
6.2.1 Posterior distribution . . . 99
6.2.2 The DP for the physical estimation . . . 100
6.3 Possible extensions . . . 102
7 Making the Physical Measure Forward Looking 104 7.1 The Methodology - part II . . . 104
7.1.1 The equity risk premium adjustments . . . 105
7.2 A frequentist justification of the physical measure correction . . . 108
8 The Datasets 112 8.1 The S&P 500 Index Options - The SPX . . . 112
8.2 The S&P 500 Index . . . 115
8.3 The risk-free rate and the dividend-yield . . . 115
9 Empirical Results 117 9.1 The classical and the revised EPK, a comparison . . . 118
9.1.1 Daily estimates . . . 120
9.1.2 Linking the precision parameter to the options market . . . 122
9.1.3 Option moments and the conditional physical measure . . . 125
9.1.4 Focus on a single time t and time-to-maturity τ . . . 129
9.2 Robustness checks . . . 137
9.2.1 GJR GARCH robustness check . . . 139
A Appendix 141 A.1 Absolute Continuity and the Radon-Nikodym Derivative . . . 141
A.2 Approximating to the mean and the variance of a ratio using Taylor . . . 142
A.3 The Riesz Representation Theorem . . . 145
A.4 Properties of L´evy processes . . . 147
A.5 Costraints for stationarity . . . 149
A.6 Consistency of DP marginals . . . 150
A.7 Moments of DP . . . 152
A.8 DP posterior moments . . . 153
A.9 Derivation of the expected rate of return of investing in contingent claims . . . 154
Bibliography 155
A.1 Summary statistics of the S&P 500 options index (SPX) . . . 167
A.2 Summary statistics of the 1988-2002, 2002-2004 S&P 500 Index prices and returns . 168 A.3 EPK horizontal support: summary statistics so far . . . 169
A.4 Summary statistics of the precision parameter . . . 169
A.5 GJR GARCH physical parameters - θ . . . 170
A.6 GJR GARCH risk neutral parameters - eθ - Simplex method . . . 171
A.7 GJR GARCH risk-neutral parameters - eθ - Quasi-Newton . . . 172
A.8 Summary statistics of p†t,T, pt−∆t,T and qt,T . . . 173
A.9 Yearly pointwise difference between qt,T − p†t,T and qt,T − pt−∆t,T with FHS . . . 174
A.10 GJR GARCH Risk Neutral parameters - Robustness Check . . . 175
A.1 Measurable partitions of Ω . . . 150
A.3 Intuition behind the model . . . 177 A.4 1988-2002, 2002-2004 time series of S&P 500 daily closing prices and volumes . . . . 178 A.5 2002-2004 weekly S&P 500 Index closing prices (Wednesday only) and 1988-2002
daily log-returns . . . 179 A.6 1988-2002 daily S&P 500 squared log returns and FHS . . . 180 A.7 2002-2004 daily S&P 500 log and squared log returns . . . 181 A.8 1988-2002, 2002-2004 Autocorrelation (ACF) of S&P 500 log and squared log returns 182 A.9 1988-2002, 2002-2004 S&P 500 log returns empirical and normal histograms . . . 183 A.10 1988-2002 and 2002-2004 Normal and QQ plots of S&P 500 log returns . . . 184 A.11 Single day (t = 61) all range of times-to-maturity probability density functions and
Conditional EPKs with decreasing α∗t,T and R.P.= 4% . . . 185 A.12 Single day (t = 3) all range of times-to-maturity PDFs, Conditional EPKs and
partially-conditional EPKs with decreasing α∗t,T and R.P.= 8% . . . 186 A.13 Time series of DOTM (OTM) put/call, total amount and relative α∗t,T . . . 187 A.14 Daily ∆t moments between fully and partially-conditional physical measure with
respect to the risk-neutral measure for short time-to-maturity (τ < 60) . . . 188 A.15 Single day (t = 4), short and medium times-to-maturity (τ = 24/57/82) left tails
with decreasing α∗t,T and R.P.= 4% . . . 189 A.16 Single day (t = 4), medium and long times-to-maturity (τ = 150/241/332) left tails
with decreasing α∗t,T and R.P.= 4% and 8%. . . 190 A.17 Single day (t = 4), short and medium times-to-maturity (τ = 24/57/82) right tails
with decreasing α∗t,T and R.P.= 4% . . . 191
A.18 Single day (t = 4), medium and long times-to-maturity (τ = 150/241/332) right tails
with decreasing α∗t,T and R.P.= 4% and 8%. . . 192
A.19 Daily log ∆t moments . . . 193
A.20 2002-2003-2004 yearly estimates of the FHS fully and partially-conditional physical measure and risk-neutral measure with α∗t,T = 2 and R.P.= 4% . . . 194
A.21 Single day (t=90) two times-to-maturity (τ = 31 and τ = 94) conditional and partially-conditional EPKs, with α∗t,T=1.75 and 0.75 and R.P.=4% . . . 195
A.22 Single day (t = 90) all range of times-to-maturity fully and partially-conditional expected return with α∗t,t+31= 1.75 and decreasing and R.P.=4% . . . 196
A.23 Single day (t = 63) all range of time-to-maturity fully and partially-conditional expected return with α∗t,t+31= 2.5 and decreasing and R.P.=8% . . . 197
A.24 Single day (t = 41) all range of time-to-maturity PDFs (left column), Conditional EPKs (Mt,t+τ† ) and partially-conditional EPKs(Mt−∆t,t+τ) (right column) with α ∗ t,T = 2.5 and decreasing, 50.000 simulations and R.P.=8% . . . 198
A.25 Single day (t = 41) with focus on the single time-to-maturity (τ s= 38/73/164/255/346) Conditional EPKs (Mt,t+τ† ) and partially-conditional EPKs (Mt−∆t,t+τ) with de-creasing α∗t,T = 2.5 and decreasing, 50.000 simulations and R.P.=8% . . . 199
A.26 Single days (t = 41, 90) single time-to-maturity conditional, Mt,t+τ, and uncondi-tional, cMt−∆,t+τ, EPKs with α∗t,t+τ = 1.5 . . . 200
A.27 2002-2003-2004 yearly estimates of the FHS (green) and Gauss. (green) Condi-tional EPKs (Mt,t+τ† ) and FHS (black) and Gauss. (blue) partially-conditional EPK (Mt−∆t,t+τ) with α ∗ t,T = 1 and R.P.= 4% . . . 201
A.28 Ljung-Box test for 2000/3500/5000/9818 observations - normal returns . . . 202
A.29 Ljung-Box test for 2000/3500/5000/9818 observations - squared returns . . . 203
A.30 Ljung-Box test for 2000/3500/5000/9818 observations - absolute returns . . . 204
A.31 Ten legs Lagrange Multiplier ARCH test for 2000/3500/5000/9818 observations . . . 205
A.32 GJR GARCH parameters (θ) robustness check . . . 206
CHAPTER
1
Extended Abstract
Estimating the market’s subjective distribution of future returns by means of backward looking historical data leads to uninformative and, at best, partially-conditional measures. What is miss-ing are the investors’ forward lookmiss-ing beliefs. This long-lastmiss-ing problem affects a huge amount of literature and leads to puzzles and suboptimal results.
The goal of this thesis is to propose a new flexible and model-free methodology for the estima-tion of the condiestima-tional physical measure which is then used to investigate different empirical and theoretical applications in financial economics.
Differently than what is required by the general theory, is the scarcity and not the completeness of information the norm for a financial investor in the real world. Recalling that the units of measure of an investor that uses real data are the P-measure and the relative real-world probability, this the-sis builds a bridge between what is theoretically required by fundamental theorems of asset pricing and what is instead effectively achievable by an econometrician that uses real data. If the latter is not aligned with the theory, it may lead to unexisting puzzles and inefficiencies. This may be particularly true from the viewpoint of the types of information used in estimation, a characteristic to often violated in practice. Furthermore, as a second step, this thesis analyzes how the impacts of this discrepancy may spread over the risk neutral valuation theory.
Given the above research questions it emerges that the main limitation of the approaches present in literature lies in the assumption that all the information about the real-world probabilities of future returns can be fully extrapolated from their historical record. This assumption may lead
to large errors, that can be reduced if significant information on the physical distribution of secu-rity returns is available from other sources. Arguing that the majosecu-rity of the econometric models present in literature are not able to exploit at its best the abundance of information presents in the market, and knowing that the all the investor’s relevant and necessary information can be found in the financial markets, I solve this informational issue in a very natural way: letting the data from different sources of information speak at the utmost. To be sure to capture the highest amount of information embedded in different asset prices, I flexibly bridge in a new way two strands of the neoclassical literature: the one related to the risk-neutral distribution, extracted from the cross-section of option data, and the one related to the physical distribution extracted from the time series of stock returns. The innovation of the model is that the two measures, once estimated independently, are then blended into a new one to obtain a fully homogeneous ratio of the two. Homogeneity focuses on the degree of conditionality of the information of the two measures, a char-acteristic often violated in literature. A natural approach to exploit simultaneously and flexibility multiple data and provide statistical inference is the Dirichlet process (Ferguson (1973)[69]). Using the precision parameter of the Dirichlet process as a proxy for the missing information, I calibrate it with respect to the daily liquidity of the option market. The revised measure, exploiting its high flexibility and being free of any constraints - but the ones required by the Fundamental Theorems of Asset Pricing (henceforth: FTAP) - is the best candidate to test and answers to puzzles concerning the consistence of the distribution implicit in option prices and the time series properties of the underlying asset prices (Bates (1996)[19]). The most convenient way to test for the consistency of the proposed measure is to use it for the estimation of the pricing kernel (henceforth: PK). Defined as a ratio of state prices per unit probabilities, the neoclassical theory requires the PK to be a monotonically decreasing function. This follows naturally from the definition of the economical marginal rate of substitution. The PK is in fact the financial counterpart of the marginal rate of substitution where different financial assets proxy consumption. Counterintuitively with the theory, recent empirical studies found several violations on different parts of the PK: at the extremes as well as in the central part of the functional. Since Jackwerth (2000)[93] and Rosenberg and Engle (2002)[64], this is known as the “pricing kernel puzzle”1.
While the risk-neutral moments extracted from option surfaces are by construction forward look-ing, the ones inferred from historical returns are only partially informative, thus suboptimal with
1Under the neoclassical setting a non puzzling PK must be non-increasing in wealth. Projecting the PK onto
mar-ket returns, a non-decreasing PK leads to the puzzling existence of a contingent claim that stochastically dominates the market index (see Dybvig (1988)[58] for the main theory and Beare (2011)[20], Beare and Schmidt (2015)[21] for possible extensions).
respect to investors’ future beliefs. This underestimation of the physical filtration produces a mis-alignment with respect to the full conditioning of the information thus violating the neoclassical theory. Empirically it turns out that most of papers present in literature are then affected by a non-homogeneity bias.
The ultimate goal of finance is how to determine the risk-return trade-off. To achieve it, all of asset pricing comes down to two basic principles: first, any asset value is nothing but an expected discounted payoff; second, a dollar when there is scarcity of money provides higher utility than the same dollar when there is plenty of money around. Therefore, states that provide one dollar when money is scarce have today higher prices. Estimating how the latter principle - that is, how to determine the PK - enters into the former, is the above cited ultimate goal of finance.
While the introduction of the risk-neutral probabilities and their use in asset pricing becomes a triumph of the modern financial community, theory has left us without a way to recover both the real-world probability distribution and the PK that enter into the fundamental pricing equa-tion. The estimation of the PK and its constituents is then of paramount importance in financial economics.
My line of research is motivated by the novel recovery theorem of Ross (2015)[140] that boosted the attention on the economic and econometric issues related to the estimation of the market’s subjective assessment of the real-world probabilities and by the empirical studies of the PK which, starting from Jackwerth (2000)[93], produced conflicting and often puzzling results.
Given the central role of the real-world probabilities for many financial operations, the pro-posed problem naturally embraces different theoretical and empirical fields of financial economics. Although wide in principle, I solve the problem econometrically with a focus on the proper modelli-sation of the information present in the market. Throughout the paper, I explore and exploit three main interconnected macro areas, namely: the higher informative content provided by the joint use of stocks and options data with respect to just using historical stock returns, the econometrical difficulties encountered in the estimation of a fully-conditional and time-varying physical measure and the non-homogeneity bias that affects the two measures thus, by consequence, the PK. First: options are by construction informative and forward looking financial assets. Since the sem-inal paper of Arrow it is known that the necessary information about investor’s present and future beliefs are encoded in these assets. Along with their higher availability on the market, the attention of academics and practitioners on the predictive content of option data is increased substantially
over the last few years. Options get even more informative if combined with other sources of in-formation, i.e.: stock data. Chernov and Ghysel (2000)[42], Pastorello, Renault and Touzi (2000) [128], Polson and Stroud (2003) [133] and Eraker (2004) [65] demonstrate how a unified frameworks that incorporates the information of option surfaces and stock returns assures a higher degree of statistical precision in the estimation of the parameters than only using historical stock data. Since then, the literature relative to the joint estimation techniques has found considerable attention. Nevertheless, I claim that the overall informative power, presents naturally in the market, has not yet been exploited at its best for the estimation of the physical measure. Starting from this point and knowing that a possible reason for the PK non-monotonicity is the lack of investors’ forward-looking information, I propose a new methodology that, updating its values given the observations in the market, joins naturally and flexibly the risk-neutral and physical measures. Tested on a daily basis and for different times-to-maturity, the model allows us to show how option prices mirror the missing investors’ risk preferences, errors and beliefs.
Second: the literature offers different parametric and non-parametric methodologies to estimate the physical distribution. As a common denominator, almost all of them are based upon a stream of backward-looking stock returns, hence of past information thus ignoring the investors’ forward looking beliefs. Encoding most of the risk appetite of the investment, the obtained measure is then, at best, only partially-conditional thus not fully informative2. Enlarging the time length of the
estimation leads only apparently to more conditional measures. The last bite of information, even if would be informative of the future, is in fact washed out by the huge amount of information used in estimation. Conditionality is of key importance with respect to today volatility and higher moments. I estimate them by means of an GJR GARCH - FHS model.
As a direct consequence of these two points, I find that the vast majority of papers that investigate empirically the PK properties compare a fully-conditional numerator extracted from options data with an unconditional or, at best, a partially-conditional denominator extracted from past stock returns. The disalignment between a backward-looking, hence uninformative, historical measure with respect to a naturally forward-looking risk-neutral measure that is able to express the rich admixture of the instors beliefs about the present and the future, loads heavily onto the relative PK. Market microstructures, possible small sample biases (Leisen (2015)[109]), the natural extra fragility intrinsic in any ratio of measures (Jobson and Korkie (1980)[101]; Sala and Barone-Adesi (2015)[145]) and the frequency of the estimation can further amplify the misestimation. As a direct
2Among others, Jackwerth and Brown (2012)[37], Ziegler (2007)[159] and Beare (2011)[20] point out this issue
consequence, the obtained functional is non-homogeneous with respect to the conditionality of the information. The consequent results are then invariably spurious and misleading.
I test my model on US index and index options data. The obtained empirical results are robust and outperform other methodologies. Interesting results are achieved in periods of strong market up/down turns, when the market movements produce high liquidity in the option market and the impact of many day-by-day operations is amplified. Given my results, I claim that some of the puzzles present in literature are due to an econometric bias with respect to an improper modelling of the information.
My findings are of values for academics, central bankers and other decision takers who wish to infer market beliefs about future distributions from traded asset prices. They also provide a way of testing the rationality of option prices. The implications of my research are likely to be relevant for risk management3, regulation and financial policy. Asset management will also benefit from
improved understanding of the PK (i.e. Sala and Barone-Adesi (2015)[144]).
Moving from the empirical to a more theoretical viewpoint, I show how the suboptimal use of a too small information set may impact strongly onto the nature and on the properties of the functional governing the empirical PK and its constituents. In turn these affect also the stability of the fundamental theorems of asset pricing. More specifically I show how, projecting the empiri-cal pricing kernel onto a too small filtration set, the missing information is translated into jumps. Under this framework in fact the process passes from being absolutely continuous - as required by the theory - to being absolutely continuous with jumps. Given the strong and unique intercon-nectivity among the measures and the PK, this change of nature of the PK impacts strongly onto the validity of the risk neutral measure that is considered a true probabilistic measure only if the relative measure is a true martingale process. Following Jarrow and Larsson (2012)[98] I propose a model-free way to test the market efficiency.
Finally, both for the discrete and the continuous cases I show, by means of a portfolio optimiza-tion, how a small and rational investor who wants to maximize her wealth by trading in simple economy made of a risk-less and a risky assets, achieves smaller profits if uses suboptimally the market’s information. What lowers the investor’s profits is the information premium, which (for the continuous case) is nothing but the Kullback-Leibler distance between the optimal and the sub-optimal PKs. The proposed models can be of interest for asset managers. The case of a financial modeller that has less than the required amount of information for pricing is in fact much closer to
the reality than the optimal amount case.
The thesis is so organized: after a brief introduction of the above cited problems, chapters one and two present a theoretical review of the estimation of the real-world measure, of the pricing kernel and the relative empirical and theoretical violations of its misestimation, i.e.: the pricing kernel puzzle. The remaining chapters, which are my main contributions to the literature, propose the new estimation technique and its theoretical and empirical applications.
CHAPTER
2
The Measure’s Problem, the Pricing Kernel and its Puzzles
In this chapter I introduce the physical measure’s problems, the financial pricing kernel (henceforth: PK) and its main violations, also known in literature as PK puzzles.
The chapter starts presenting the risk-physical and the risk-neutral measures, their use in financial economics and how they are estimated. Focusing on the importance of the use of the information for the estimation of these quantities, I present some of the main theoretical and empirical pitfalls present in literature. Most of these problems will then be recalled in the subsequent chapters, where I will show how to possibly overcome these issues.
Although the PK is a byproduct of the two densities and a discount factor, I focus most of the attention on the issues that affect the risk-physical measure4 with a particular emphasis on the
main theoretical and then econometrical problems present in literature. As it will be made out clear along this chapter and all over the thesis, it is in fact the real-world measure the one more prone to have informational biases. Accordingly I focus the attention on how a suboptimal physical measure may impact strongly on the estimation of the PK.
Finally I introduce how the joint use of different asset prices may provide a solution for these information issues. Along the chapter it will emerge how the main problem between most of puzzles present in literature is not the econometric model used, but instead the inputs used in the econometric model. As long as the input used cannot be informative (by structure), there cannot in fact exists any model that could fix the issue in a natural and valid way.
4The abundant financial literature describes the above mentioned “physical” measure with many different names,
i.e.: “real-world”, “risk-adjusted”, “historical”, “statistical”,“actual”, “objective” or “subjective” depending on the framework.
2.1
The physical and the risk-neutral measures in finance
Although I will focus the attention on the physical measure, its definition is somehow blurred and it will be understood more easily by defining the risk-neutral measure first. Justified by the three fundamental theorems of asset pricing and defined under some technical requirements, the risk-neutral measure is a fully theoretical probability measure built on the assumption that the today value of an unknown financial asset is equal to its future expected payoffs discounted at the today risk-free rate. Given this framework, as a main advantage, all investors are neutral with respect to all exogenous and endogenous risk. This strong but economically accepted assumption makes many pricing tasks incredibly simpler.
Unlike the extremely powerful but somehow fictitious risk-neutral measure, the real-world proba-bility of future states5 takes into consideration also the investor’s subjective beliefs thus assuming
the eventual existence of a subjective risk premium. Dealing with the often intangible past, present and future subjective beliefs of the investors, the real-world measure estimation is by construc-tion a not a trivial task to perform. This econometrical issue is what made and makes the risk neutral measure appealing, both in academia and in the industry thus leaving the theoretical and the empirical literature mostly silent on how to estimate the less straightforward but more correct real-world one.
Estimating the real-world measure, the main limitation of the different approaches present in lit-erature is on the assumption that all the information about the real-world probabilities of future returns, can be fully extrapolated from the historical record. This assumption may lead to large errors, that can be reduced if significant information on the physical distribution of security returns is available from other sources.
Although the literature is almost silent on its theoretical and empirical estimation, the use of the real-world measure in finance is huge. Given the goal of the thesis I will only focus on those papers that use the real-world measure as a “tool” for the estimation of the PK.
Jackwerth (2000)[93] and Jackwerth and Brown (2013)[37] propose a risk-adjusted non-parametric kernel density estimation over 48 non-overlapping monthly returns. A¨ıt Sahalia and Lo (1998)[3] use the same technique but using 1,008 overlapping daily returns. Even if markets would be strongly efficient and fully transparent, so that the last value of the time series would be fully representative and informative with respect to the investors beliefs, the huge amount of data used by these
tech-5The future state can be linked to any financial risky action whose unknown outcome can be computed by means
niques washes out its impact thus making the final result totally uninformative with respect to the future views of the market. It follows that both methods are essentially unconditional with respect to the today observations thus providing the least conditional approach to estimate the real-world future probabilities. A more sophisticated technology is present in Christoffersen, Heston and Ja-cobs (2013)[45] where the physical parameters are extracted from 240 monthly returns by optimizing the Heston and Nandi GARCH model with filtered historical innovations. The methodology is in line with Rosenberg and Engle (2002)[64] and Barone-Adesi, Engle and Mancini (2008)[15] which extract the real-world probabilities through an asymmetric GJR-GARCH (1,1) with empirical in-novations over a past stream of daily observations. Although these GARCH techniques improve the degree of conditionality of the estimation, given the structure of the underline used - stock returns - a full inference as well as a full recovery of the investors future beliefs is impossible without using more informative dataset. Bliss and Panigiritzoglou (2004)[28] relies on parametric PKs to extract the real-world densities from the risk-neutral ones. Unfortunately modelling the PK parametrically may lead to substantial errors in estimation. An antithetic approach is Ross (2015)[140] which uses only options data to infer the real-world probabilities. Among the other limitations, the Ross recovery theorem is inconsistent with some evidences from the stochastic volatility modeling literature -e.g. Bollerslev, Chou, and Kroner (1992)[30] - indicating that future state probabilities depend more on the recent events than long-ago events, but that long-ago events still have some predictive power.
Christoffersen at al.(2012)[47] and Bollerslev and Todorov(2011)[31] confirm that most of the risk and risk pricing information of the underlying asset can be extracted from derivatives product. Once adjusted for a risk premium6, the risk-neutral measure extracted from the option surface may
reflect all the information publicly available to investors thus providing most of the missing infor-mation. Given this powerful feature of the option surface, my approach is to solve the presented problem empirically and in a very natural way: letting the market data speak as much as possible. As a consequence I go for a novel intermediate approach that uses both historical and forward-looking data to provide a time-varying and fully-conditional real-world measure. As a starting
6The risk premium adjustment is needed for consistency with respect to the objective density. The adjustment
however implies that state prices are function of index returns only. In the presence of other priced risk, such as the variance risk premium discussed by Heston and Nandi (2000)[87], or higher premium like the skewness or kurtosis risk premium it is interesting to investigate whether better objective estimates may be obtained rescaling the risk-neutral distribution to remove the volatility premium. However, Chorro et al. (2012)[43] suggest that the improvement induced by the variance risk premium, should be marginal. That should hold even more if parameters are re-calibrated.
point and benchmark I use the already advanced econometric model of Barone-Adesi, Engle and Mancini (2008) and I improve it toward its full conditionality. This already high starting point makes my exercise more complex as well as more valuable for the literature.
Econometrically, the main tools needed in estimation are: the GJR-GARCH-FHS model to esti-mate the daily moments and the Dirichlet process for mixing the measures.
I do not plan to modify the numerator, that is the risk-neutral density. In fact that would imply inefficiencies in the option market that, although rare given my dataset, are of course possible but beyond the scope of this thesis.
If from one side the use of the risk-neutral measure to extract the entire real-world probability is gaining attention (Ross (2015)[140]), the much more natural approach of using the risk-neutral density to improve the estimation of the physical density is not common in the finance litera-ture. As Ross (2015)[140] shows, under particular assumptions, it is possible to recover the entire physical density from the risk-neutral one. Given its structure, the model is not affected by the homogeneity-bias but, the price to pay to obtain a complete recovery of the entire physical den-sity from the risk-neutral one is pretty high in terms of assumptions needed. A full recovery is in fact possible only in a special case, namely when the state transition matrix has full rank, the underlying state vector is bounded and the utility function is state-independent. Option prices and investors in practice are unlikely to satisfy these requirements thus confirming Borovicka, Hansen and Scheinkman (2015)[33]. Carr and Yu (2012)[40] and Audrino, Huitema and Ludwig (2015)[9] propose possible extensions to relax some of the initial assumptions of the Ross recovery theorem. Differently with respect to these papers I do not extract the entire physical measure from the risk-neutral one but I only use it to complete the objective. As a primary advantage, I don’t need to put any structure neither on the functional form of the PK, nor on the preferences of the investors. It is reasonable to conjecture that limited departures from the conditions that ensure full recovery may still lead to the possibility of using the option surface to improve significantly the assessment of the physical distribution. My conjecture is supported by the widespread use of physical probabilities based on option prices in the business community. These probabilities are generally based on the Black-Scholes model, in spite of its inability to fit well empirical option prices. It appears therefore that the usefulness of option prices to predict physical probabilities is widely recognized and it is quite a common practice in the business community.
2.1.1
The pricing kernel
Defined as a discounted ratio of state prices per unit of objective probability, the neoclassical theory requires the PK to be a monotonically decreasing function in wealth. Acting among them, the PK is what links the risk-neutral probability to the real-world one. How the information flows between the two worlds is thus determined by the PK and viceversa if one uses the measures themselves to determine the PK. Mathematically, the PK is nothing but a discounted operator that allows us to move between the two measures. It follows that a correct PK must convey all the present and future expectations, beliefs and errors of the investor. In a continuous world with no-arbitrage7, the today
PK, (Mt,T), is defined as the Radon-Nikodym derivative of the risk-neutral measure with respect
to the physical measure of security returns. The time window, represented witht,T, underlies the
forward looking nature of the variables and applies to all inputs of the PK thus to the PK itself. Given a finite economy, 0 < t < T < ∞, if the two measures satisfy mild regularity conditions, the PK is defined as the present value of the ratio of the risk-neutral density of returns, qt,T(R),
divided by the physical density, pt,T(R):
Mt,T = P Vt
qt,T(R)
pt,T(R)
∀ t ∈ T (2.1)
From the FTAP, equation (2.1) must be fully-conditional with respect to the time t expectation and the relative PK a strictly positive martingale process: Mt,T > 0. By no-arbitrage constraints, the
ratio (2.1) is related to the expected gross return, Rt,T, from investing in simple state contingent
claims:
Rt,T =
1 Mt,T · rtT
(2.2)
where rtis the gross risk-free rate and T is the chosen time interval. If the time interval is allowed
to change, the PK becomes a stochastic process that can be used to price options of different maturities. Despite its high component of randomness, most of the parameters of the PK are chosen a priori in much of the existing literature. The consequences of these arbitrary choices are quite dramatic, especially for pricing states far away from the current state of the market. A deterministic PK may lead to severe mispricing and uneffective hedging strategies. It follows that a proper modellisation of the PK can only be achieved by means of a time-varying stochastic model.
7Requiring the existence of no-arbitrage I am implicitly assuming the existence and validity of the free portfolio
formation (FPF) and the law of one price (LOP) or, equivalently, the existence of a linear subset of the whole sample space and the linearity of the function.
2.2
Introduction to the pricing kernel puzzle
Despite its key role in asset pricing, there is still not a cut and clear agreement among financial researchers and practitioners on the best procedure to properly estimate the PK.
Broadly speaking, I refer to the PK puzzle when the estimated PK is not general enough to properly explain the whole cross section of option data8. Graphically this is expressed as a non-monotonic decreasing function, as required by the neoclassical theory.
Depending on the completeness of the market, the non-monotonicity of the PK implies, for a com-plete market framework, the existence of a trading strategy in contingent claims that a.s. first-order stochastically dominates the underline returns (Dybvig (1988)[58]). In presence of market incom-pleteness, the monotonicity of the PK is the key ingredient for the literature that deals with the incomplete-market stochastic dominance option pricing bounds (Perrakis and Ryan (1984)[130], Levy (1985)[111] and Constantinides, Jackwerth and Perrakis (2002)[50]). Moreover, a misestima-tion of the PK is often directly related to the apparently over/underpricing of out-of-the money (OTM) and deeply out-of-the money (DOTM) options. Empirically the main problems usually arise at the extremes of the PK or in the area of zero returns where, by a flex, the obtained results violate the non-arbitrage arguments. The former issue is primarily caused by two problems of inadequacy: one theoretical and one empirical. The first is mainly due to a bad modellisation of extreme events so that is not possible to fully capture the rare but existing episodes that lies into the deepest tails of the distribution. As a consequence the model produces bad estimate at the extremes of the PK that may lead to apparent mispricing. The second one is possibly due to the low liquidity of deeply out-of-the-money put and (above all) call9 option which poses an important empirical challenge.
In between there is the U-shaped PK puzzle which may be caused by the missing information from the call options or from the lack of options themselves.
The theoretical problems could be overcome by a better modelling, while the empirical ones could only be reduced by applying numerical artifacts or by setting strong a-priori model assumptions10.
8In this thesis I refer primarily to index option prices but, with no loss of generality, the same conclusion can be
extended to other financial asset classes.
9It is by now and empirically known fact that, mainly for hedging reasons, put options are more traded than call
options.
10Being an empirical issue linked to the low tractability of the model inputs, due to the missing or low liquidity of the
data, it might be possible to reduce the bias of the estimated final results by i.e.: imposing non-increasing/decreasing bounds in the tails of the final function, smoothing the extreme outcomes of the tails synthetically, producing a higher amount of data by simulation, fitting extreme distributions for deepest values and so on. Some of these corrections have been already applied in literature (see 3.3)
I will shown that a better modelling can be achieved as long as the distribution of the innovations is properly chosen. The between problem is instead solvable by properly exploiting the market options information.
Also the puzzling results in the central area, which among the others may be due to an incorrect theoretical model or by a misestimation of the risk premium, can be alleviated through the presented model or by better calibrating the risk premium. It turns out that also the central violation is strongly connected to the apparent underpricing of call options. Due to its high flexibility and ability of being properly adapted to different market environments, the proposed model is able to answer to both problems.
The literature dealing with the PK estimation is huge. Concerning its violations in estimation, in literature is possible to find more then a dozen of different explanations regarding the existence or the non existence of the PK puzzle. The different approaches of [85]Heston (1993), Jackwerth (2000)[93], Heston and Nandi (2000)[87], Engle and Rosenberg (2002)[64], Bliss and Panigirtzoglu (2003)[28], Barone-Adesi, Engle and Mancini (2008, henceforth BEM) [15], Chabi-Yo (2008)[157], Chabi-Yo, Garcia and Renault (2012)[158], Christoffersen, Heston and Jacobs (2012)[44], Song and Xiou (2012) [149] to mention only some of the methods present in literature, lead to very different conclusions on security pricing and their related empirical puzzles. Lately, the debate has also been enriched by behavioral finance models, such as the ones proposed by Ziegler (2007)[159], Shefrin (2008)[148], Hens and Reichlin (2011)[84], and Barone-Adesi, Mancini and Shefrin (2013)[18]. No matter what is the story behind the paper, they all have a common strong drawback: the denominator of the PK - the physical measure - is unconditional, thus improperly estimated. This theoretical bias lead to dramatic empirical problems. It turns out that the main reasons behind the misestimation lies in the type of assets used for the estimation of the measure.
As noted by Bollerslev, Chou, and Kroner (1992)[30]: “future state probabilities depend more on the recent events than long-ago events, but long-ago events still have some predictive power. Misspecification of state probabilities induces error in the estimation of the PK since the denom-inator of the state-price-per unit probability is incorrectly measured”; hence, the good quality of final results, which depend by the past history, the today scenario and by the forward-looking data used in estimations, are heavily linked to the information content embedded into the measures used for the final estimations. Bollerslev et al. (1992)[32] propose an asymmetric GARCH model with empirical innovation densities to go around the problem. The proposed denominator, although improved, is still poor from an informational point of view thus producing poor PK estimates. Following their insight and partially in line with Ross (2015)[140], in this thesis I propose a new
non-parametric methodology that I apply to the denominator to overcome the issue. By empirically mixing the two measure, I aim to improve the physical measure, hence the overall PK.
One of the main goal of this thesis is to focus on the joint informational content of option prices and stock prices with respect to stock prices alone. I do not attempt to answer the question, “Is the model I assume correct?” Instead, I test whether, under the model I present, inference based on option and stock prices can lead to better small-sample properties for the PK.
2.2.1
More on the measures’ problem
Being the central concept of the thesis, in this section I stress and I go more in deep with respect to the measures’ problems in the estimation of the empirical PK. The PK is a the discounted ratio of measures. It goes by consequence that the single measures and the PK are among them uniquely and tightly interconnected. As advantage, the knowledge of two of them automatically implies the knowledge of the third one. As a disadvantage, the misestimation of one of the two also loads nat-urally onto the third one. The literature is abundant of different methodologies, both parametric and nonparametric to estimate the two measures and the PK11, either jointly or separately.
While theoretically the PK is a decreasing function of aggregate resources, many empirical pa-pers found violations in different areas of the functionals (A¨ıt Sahalia and Lo (1998)[3], Jackw-erth (2000)[93], Brown and JackwJackw-erth (2001)[37], Rosenberg and Engle (2002)[64], Yatchew and H¨ardle[156], Ziegler (2007)[159]). The persistency and robustness of these violations put interest on the investigation of the so-called pricing kernel puzzle. Since then, researchers have taken great interest in proposing different econometric techniques to estimate the PK and its measures trying to answer to the puzzle.
The Real-World Measure
From the literature it emerges that, among the others, the most problematic econometric task to perform is to assure full conditionality to the time-varying estimation of the market’s subjective distribution of future returns.
Empirically it turns out that to propose a day-by-day estimation methodology of the measures that compose the PK is as much important as econometrically non-trivial. The importance of a correct estimation of these measures lies in the wide use of the PK for many daily operations (i.e.: asset pricing and risk management). The econometric issues, as also partially commented in Bliss
11The nonparametric ones have to be preferred. From Fama (1965)[68] and Mandelbrot (1966)[118] it is well-known
and Panigiritzoglou (2004)[28], lie in the nature of the underlying. Using historical data many estimation methodologies put unreal and theoretically not required stationary assumption on the estimation of the measures or on the PK itself. Therefore, it is not by chance that many works only propose monthly or yearly estimations remaining silent for the daily ones. It turns out that, from this viewpoint, the main reason behind a biased estimation is not the technique used for the estimation but rather the data used as input.
Needless to say, models’ inputs are of key importance to determine the type and the quality of the final outputs. Market option prices provide a naturally forward-looking measure; in fact, by contract, the owner of an option has the right, but not the obligation, to exercise it at expiration (or before it, if it is an American or exotic option). This feature is reflected into the option value, which is in fact a non-decreasing function of volatility. Therefore, the market prices of options, through the implied volatilities, encode important forward-looking information about the future distribution of prices of the underlying asset. Also the higher moments of the distribution embed important information. This is particularly true into the tails: where lies most of sentiment. The same richness cannot be achieved by stock and index prices. By their contractual nature these assets are options-free and unique, hence poorer from an informative viewpoint.
Estimating the PK and extracting the risk aversion from stock prices is a well-known problem in the literature. Despite their unambiguous superiority in estimation, it is only from the beginning of the millennium that scholars have begun using options data for estimations (Chernov and Ghysels (2000)[42]). The superiority in estimation of options with respect to stocks (and also futures) is manifold.
First, stock prices have discounting as well as time-horizon problems. By contract definition, stocks do not expire but live infinitely. They are defined over an indefinite time horizon; therefore the dis-counting process becomes non-trivial. As a consequence, additional assumptions (which often times are unreal, i.e.: on the characteristics of future dividends), are needed to determine the discounted cash flow. On top of that, the obtained final outcome is statistically not much informative being the discounted final cash flow just a single value; this means that no inference about variations in preferences over different time horizons is possible.
On the contrary, option contracts have by definition a bounded life that is defined by a fixed time-to-maturity, T , and is known from the inception of the contract12. Moreover, for each time t, we
12Only perpetuity options differ from this characteristic, but are an exceptional case, more theoretical than really
have the so-called option surface: a broad spectrum of times-to-maturity, Ti and strikes Kj13that
covers different states of the world. These characteristics allow for natural inference on preferences over specific horizons and simultaneously over different horizons and strikes.
These features make options qualitatively superior also to futures and forward contracts which, by their nature, do not share the discounting but only the time-horizons problem. In fact, even though these contracts have finite maturities, they do not differentiate across states of the world, thus providing only a single statistic for each expiry date/observation date pair. As for the stocks, having a single data a direct density estimation is not possible without further assumptions14.
As a consequence, we can directly estimate a time-varying risk-neutral density from a cross section of options with no need to bound the structure of the data we sample from, but the same is not possible from the time series of the underlying. For the physical density in fact, to obtain good results when making inference from a time series of stock returns, we need to put a priori unrealistic bounds on the time structure of the data. In conclusion: inferring densities from the option surfaces does not share the above cited stationarity problems. While the degree of assumed stationarity can be of different length, in no case can be justified economically
Given the above characteristics, it is natural that just using historical stock data it is not possible to properly capture the investors future beliefs. The difficulty in estimating the objective measure is that it directly depends on the evolution of the underlying process, which time series is only partially informative. Estimations can be further complicated by possible data-problems (i.e.: data scarcity) and market frictions. For these reasons, some authors fully avoid the density estimation and propose a ratio of estimated risk-neutral measure over an unknown physical measure (Golubev et al. (2008))[77].
As (a non-)alternative, some papers propose to increase the degree of conditionality statistically by increasing the rolling window of the estimations. Jackwerth (2000)[93] tries to “enrich” the informative content of the physical measure by working on a longer time series. He proposes to use ten instead of the classical two to five years of data for estimation. As expectable, the obtained results are qualitatively almost unchanged. Due to the intrinsic backward nature of the dataset, to increase the length of the rolling window used in estimation is not a solution. At some extremes it might also lead to even more misleading results. In fact, by using more and more data, there is
time before T . Although non-trivial, a reliable discounting is still possible.
13Both indices are finite, i.e. i = 1, . . . , M and j = 1, . . . , N .
14As noted by Jackwerth and Rubinstein (2006)[94], starting from at least 8 option prices we have enough
infor-mation to determine the general shape of the implied distribution. Although possible, to have too a few data in estimation may lead to possibly dangerous small sample bias.
a natural statistical improvement coming from the higher conditionality of the measure but at the same time we reduce the overall informative content of the last data. Also in the case in which this single stock value would be fully informative with respect to the future scenario of the market, it would become almost negligible and prevailed by the high amount of past data used. Using time series of historical returns thus make the single values almost fully unconditional. Needles to say, the same problem is then naturally translated onto the relative estimations.
The Risk-Neutral Measure
While the real-world densities presents several difficulties, it is by now an empirical and theoreti-cal fact that the most important information embedded in financial instruments is the state price density (SPD), or the Arrow-Debreu state prices.
The time-state preference model of Arrow (1964)[6] and Debreu (1959)[7], which proposes the now-named Arrow-Debreu security, models a very basic financial instrument (called pure or primitive security) that pays one unit of numeraire (like a currency or a commodity) on one specific state of nature and zero elsewhere. Passing from discrete to continuous states, Arrow-Debreu securities are defined by the so-called state price density (SPD). Under the continuous framework the security pays one unit of numeraire x if the state falls between x and x + dx and zero elsewhere. As a consequence of their high informative content, the Arrow-Debreu securities become one of the key element to work with and understand the general economic equilibrium under uncertainty and to determine the price of any contingent claims. For these reasons the estimation of such SPDs has been a very important topic of research within the financial economics community. Once that a complete set of option prices for a specific time-to-maturity is available, there are many parametric and non-parametric methods to recover the risk-neutral measure15.
To summarize: to obtain a time-varying estimation of the real-world density it is of key im-portance, more than to pick the right extraction model, to use the right source of information. Being historical stock returns only backward looking, hence only partially informative, we need to somehow complete the measure by using other sources of information, i.e.: the investors’ future sentiment extracted form the implied moments of the option surface.
Parametric methods define the functional form of the risk-neutral distribution which can then be defined by estimating the relative set of parameters.
These methods can be divided into three broad categories:
• Expansion methods: which correct and expand the basic distribution to make it more flexible;
• Generalized distribution methods: which add extra moments to generalize the basic normal and log-normal distributions;
• Mixture methods: which blend different distributions.
Although parametric methods have the advantage of being less computationally intensive and more precise once the parametrization is correctly done, it is well known that they do not perform well with financial market data 16. Any parametric misspecification leads to possibly highly in-consistent estimates which in finance can lead to extreme mispricing and to large uncovered risks. Even more, testing a fully parametric model is always a joint test of the model and the (arbitrary chosen) parameters. Changing the latter could lead to completely different outcomes.
Better results can be obtained via non-parametric methods. Not requiring any specific parametric form and thus achieving greater flexibility 17 in fitting the measure on option prices. Although
not in principle, also non-parametric models have to follow some assumptions to better model the economy (i.e.: the no arbitrage principle.). In any case, these assumptions, leaving freedom to the final functional form to estimate are surely weaker than any parametric model and less likely to be violated in practice (see: ¨Ait-Sahalia and Lo (1998)[3]and ¨Ait-Sahalia and Duarte (2003)[2]). Non-parametric models are so classified:
• Kernel methods: which are comparable to regression methods but without specifying the parametric form of the function;
• Maximum entropy methods: which, satisfying some minor constraints, achieve the fit of the distribution by minimizing some specific loss function;
• Curve fitting methods: which are a broad class of methods where the objective of the estima-tion is approximated by some general funcestima-tion.
If the estimation of the risk-neutral measure has been largely studied with the production of satisfactory results the same is not true for the estimation of the physical measure.
16There are no generally accepted parametric forms of asset prices, volatility surfaces, or put/call price functions
(Campbell, Lo and MacKinlay (1997), chapter 2[38]).
17The main drawbacks of the non-parametric models are: a bad convergence in small samples, which is also
amplified when the derivatives of the function are estimated, and the necessity of a usually higher than available data since are usually very data-intensive methods.
The higher reliability and precision of the risk-neutral measure with respect to the physical one can also be implicitly deduced from a particular case of the PK puzzle which arises frequently in literature. By a flex in the central area of the function some papers (i.e. Jackwerth (2000)) show the existence of a puzzling PK in the area of zero or nearly zero returns. This is by far the area with the highest amount of options prices available, thus where the risk-neutral measure is at its highest precision.
Therefore, even though the misestimation of the risk-neutral measure could still be among the pos-sible causes of some PK puzzles, it is considered as the more stable, easier to estimate and reliable between the two measure.
To conclude: to obtain a time-varying estimation of the real-world density it is of key importance, more than to pick the right extraction model, to use the right source of information. Being historical stock returns only backward looking, hence only partially informative, I need to somehow complete the measure by using other sources of information, i.e.: the investors’ future sentiment extracted form the implied moments of the option surface.
2.3
Joining the measures
As documented by Eraker (2004)[65], and following Chernov and Ghysel (2000)[42] and Pan (2000)[126], the use of the risk-neutral and physical measures became very powerful in estimation once properly mixed.
The advantages are multiple. First, if the joint measure is properly calibrated, it is easier to dis-entangle the risk premium that arises from volatility and jumps. Second, the big quantity of data required to produce and an accurate analysis using stock market data makes the overall estima-tion technique extremely time consuming. Thanks to the one-to-one relaestima-tion of opestima-tions to the conditional returns distributions, I can exploit the richness of options and use shorter samples for estimation. Since non parametric methods are usually extremely data-intensive this feature could overcome the problem by reducing the amount of data inputs required thus making computations possible and less time consuming (Rubinstein (1994)[142], Jackwerth and Rubinstein (1996)[94], Dumas Fleming and Whaley (1998)[57] and ´’Ait-Sahalia and Lo (1998)[3]). Last but not least the strong relation among the two asset classes can be exploited for a model misspecification diagnostic since, by theory, the estimated parameters implicit in derivatives prices have to be consistent with those used in asset prices data alone
Encouraged by the documented high estimation power of the joint methodology and as a con-sequence of the high difficulty in obtaining a valid estimation methodology for a pure physical distribution, in this thesis I propose a new non-parametric methodology to estimate the physical measure by joining the two measures.
As a main idea, I solve the problem using part of the naturally forward looking information pro-vided by the risk-neutral measure, and I mix it with the objective one to complete it so that the new measure becomes conditional with respect to all the information available thus leading to a fully conditional, hence homogeneous PK. In particular I should condition to the present level of volatility. I reconstruct it by means of an asymmetric GJR GARCH process with empirical inno-vations. As a main result I obtain a more informative and flexible measure that impacts positively the estimations of the PKs up to answer to PK puzzle.
Statistically, a natural approach to exploit simultaneously different sources and provide statistical inference is the Bayesian approach (see, among the others Berger (1985)[23], Bernardo and Smith (1994)[23]).
The forward looking information that can be extracted from option prices would be an economically coherent measure only if investors were risk-neutral. Generally investors are rational and provide physical (subjective) estimates such that the two measures can no longer be the same18. What
would fill this gap so that the two measures turn back to be equal is the risk aversion adjustment:
Risk-Neutral prob. = Risk Physical prob. · Risk Aversion (2.3)
By the FTAP, under no arbitrage conditions, the risk-neutral returns are “risk-adjusted” physical returns which cannot earn more than the risk free rate:
EQ(Rt,T) = EP(Rt,T) − Risk Premiumt,T (2.4)
= rf (2.5)
As a consequence, being the proposed new physical probabilities a mixture of measures, I need to properly correct both measures by a risk premium19. A full treatment of the risk-neutral adjustment
18In this case, the use of the risk-neutral probability approach would produce highly biased forecasted values and
would only be good to express future market expectations.
19Without the risk premium adjustment, the use of the risk-neutral distribution for forecasting would lead to
heavily biased forecasts. For example, if used by financial regulators as diagnostic tool for future financial distress, the unadjusted measure would lead to an increased future market turbulence rather than a reduced one. See Anagnou et al. (2002)[4] for a review of the literature on the topic.
in chapter 6. The risk adjustment indicates investors’ preference for risk. What stated is true in one-period models; for a generalized inter-temporal n-periods models also higher moments premium matters (i.e.: volatility premium, kurtosis premium, skewness premium and so on).
CHAPTER
3
Theoretical Set Up And The Literature So Far
After presenting the empirical pricing kernel, which will then be extensively used and rearranged in the next chapters of this thesis I will do a step back to present, from a theoretical viewpoint, four different approaches to derive the financial PK. These methods are part of an extensive list, I don’t pretend to be exhaustive. All derived under no-arbitrage or in equilibrium, the first two approaches presented are more closely related to measure theory and functional analysis. The last two methods instead have stronger economical foundations in their initial assumptions. Analyzed from a larger viewpoint, all models boil down to the same final result thus showing how all approaches can be seen as somehow all linked together.
As a first and more convenient definition, I defined in chapter 2 the PK as a ratio of measures, which is conveniently defined as the Radon-Nikodym[124] derivative of two measures; in section (3.1.1), I properly define which are the technical requirements and the economical meanings of these two measures. I then analyze the martingale properties of the PK and its relation with the first FTAP20.
Following Harrison and Kreps (1979)[80], I show how the PK is nothing but a direct application of the Riesz representation theorem[138].
Finally, starting from stronger economical foundations, I derive the PK under a general equilibrium model and under no arbitrage conditions. The former is linked to a set of microeconomic assump-tions (i.e.: Lucas (1978)[115] and Rubinstein (1976)[141]) while the latter to probabilistic ones (i.e.: Black and Scholes (1973)[27] and Merton (1973)[120]).
20No-arbitrage, market completeness and market efficiency are the three elements upon which, from a theoretical
viewpoint, is based most of the financial economic theory. The treatment of these elements compose the three FTAP.
In section (3.2) I briefly investigate the one-to-one relation that links the investor’s risk aversion with the PK.
I close the chapter with a literature review concerning the parametric and non-parametric method-ologies used to estimate the PK.
3.1
The empirical pricing kernel (EPK)
Relative to the PK, the final goal of this thesis is to study the effect of a proper conditioning of the information on the inputs that compose the PK intended, as defined in (2.1), as a state price per unit of objective probability.
My work is divided in two parts: the first (chapters (3) and (4)), with a stronger theoretical foun-dation, puts the basis for the second, treated in the remaining chapters, where I test empirically what previously proposed. Throughout the thesis I refer to the PK when the analysis is mostly theoretical and to empirical PK (henceforth: EPK) for the empirical one.
Playing and somehow abusing with words, the PK is the “characteristic function” of any asset pricing model; in fact, in it, we find all the relevant and necessary information required for pricing any type of financial asset class. By the same token but from a statistical viewpoint, it can also be seen as the “sufficient statistic” of any asset pricing model.
Given the focus of the paper, before I present the model for the estimation of the Empirical Pricing Kernel, a more rigorous definition of the filtration set used is needed. Unless differently stated, these specifications apply to all models throughout the thesis. Defined in a fixed and finite planning horizon t ∈ T , where T < ∞ a filtration is nothing but an increasing family of σ-algebras {Ft: t ∈
T }. It follows that:
Fs⊂ Ft⊂ FT ⊂ F for 0 ≤ s ≤ t ≤ T
represents the information flow that generates F = σ(S
t
{Ft: t ∈ T }).
Definite on a rich enough filtered probability space (Ω, F , P, F), the filtration F = (Ft)t∈Tis assumed
to satisfy the usual hypothesis thus implying a complete and right continuous filtered probability space. Completeness is achieved if the probability space is P complete and F0 contains all the
P -null sets of F . Right continuity is defined as:
Ft= Ft+=
\
u>t
A correct modellization of the information flow is crucial for a proper measurability of any random process. In fact, a stochastic process {Zt: t ∈ T } is said to be adapted (hence measurable) to a
filtration {Ft: t ∈ T } if, for each t ∈ T , the process is Ftmeasurable.
Definition 3.1.1. Defined in a no-arbitrage economy, where for each time t ≥ 0 the probability space is described by (Ω, F , P, F), the time t conditional EPK for T = t + τ is defined as:
Mt,T = e−rt(T −t) qt,T(ST|St) pt,T(ST|St) Ft> 0 (3.2)
where qt,T represents the conditional risk-neutral or state price density (SPD), pt,T the conditional
real-world density, rt the continuously compounded daily risk-free rate and St is a proxy for the
market portfolio.
For my empirical exercise, which will be deeply presented in chapters 5, 7 and 9, Strepresents
the S&P 500 index thus projecting the EPK onto the extended positive real line occupied by all possible values taken by the index.
Subscriptt,T emphasizes the forward looking orientation of the value to which applies and denotes
that all parameters are fully-conditional to all information available at date t with respect to a future time T . This conditioning, too often violated by many models, plays a key role for a correct definition of a homogeneous EPK. To lighten the notations I assume τ = (T − t) as fixed and equal to one year.
Function (4.17) can be estimated in different ways, depending on how is defined and how the ingredients that compose it are obtained. In all cases, since the EPK represents the fully non parametric investors behaviors, a proper modellisation can only be achieve by placing as much less structure as possible to the functional that describes it. My empirical approach is to let the data speak at the utmost hence to be fully non-parametric with respect to the structure of the EPK and only partially parametric for the estimation of the empirical moments. Once the conditional measures qt,T and pt,T are estimated, the EPK is recovered by simply taking their discounted ratio
thus not imposing any constraints on EPK but the ones underpinning the FTAP.
3.1.1
PK as the conditional Radon-Nikodym derivative
Following most of the literature and from a measure theory viewpoint, the PK is usually defined as the Nikodym derivative of two measures, once properly definite. Exploiting the Radon-Nikodym theorem is a straightforward an elegant way to quickly and fully define the PK. As long as the assumptions behind the model are satisfied, the model has also strong economical foundation. I
start presenting a generic unconditional form of the Radon-Nikodym derivative and its connections with the economy then, in chapter (3.1.1), I make it conditional to better study how the daily information flows among the two measures. Although not immediate in principle, the link of the obtained PK with the FTAP is strong and self-explanatory. I explore it as well.
As anticipated in chapter 2 pricing in a risk-neutral world has the extraordinary advantage of using a unique probabilistic measure which is economically neutral and so applicable to all investors. Unfortunately, if the representative investor’s risk attitude is not neutral, the obtained values may provide misleading results. As a link among the two measures, there is the PK which is the collector of all beliefs, errors and premiums of the investor: in a nutshell it embeds all the relevant information required to convert the risk-neutral measure into a real-world measure and vice-versa. Mathematically, given its role, it is convenient to express the PK as a discounted Radon-Nikodym derivative. So represented the PK is nothing but a discounted kernel: a time adapted operator which, working as a transition function of a stochastic process, allows us to move from a neutral to a subjective world.
Proposition 3.1.1. Defined on a measurable space (Ω, F , F), the unconditional version of the Radon-Nikodym derivative requires that:
1. P is a σ-finite measure on (Ω, F , F) and is atomless;
2. Q is a σ-finite measure on (Ω, F , F) and is absolutely continuous with respect to P on F : Q P21;
3. the Radon-Nikodym derivative of Q with respect to P , is a nonnegative Borel measurable function M defined on the extended real line (M : R+ → R+
) and satisfies P {x ∈ R+ :
M (x) = y} = 0 for all y ∈ R+.
If assumptions 1 and 2 are satisfied, then there exists an a.s. (with respect to measure Q) random variable M which follows directly from the application of Appendix (A.1) and fits into the last point of proposition 4.2.1. It follows that the unconditional version of the Radon-Nikodym derivative, is defined as:
M = dQ
dP (3.3)
and satisfies the following: