The Data and Notation - Industry Sector Models

Application Two: Price Prediction

5.1 Industry Sector Models

5.1.1 The Data and Notation

The data for our sector models, and the other models presented in this chapter, is daily closing prices from the Tokyo stock exchange for the period from January 4, 1989, to February 25, 1991. An example of the specic data used for the sector models is shown in Figure 5-1 for Kyokuyo Company, a medium sized Japanese shing company, as well as the corresponding values for a number of broad market indicators, including the Nikkei 225, small, medium and large capitalization indexes, and the industry sector index which contains Kyokuyo. In the following work, the rst 430 points from each time series (i.e. up to September 13, 1990) were used to build models, and the

5.1. INDUSTRY SECTOR MODELS 101 following 100 values were held back for out of sample testing.

Dec88 89 Jan89

Feb89 Mar89

Apr89 Jun89

Jul 89 Aug89

Sep89 Nov89

Dec90 Jan90

Feb90 Apr90

May90 Jun90

Jul 90 Aug90

Oct90 Nov90

Dec91 Jan

1000150020002500300035004000

Sector + 1000

Kyokuyo Medium Cap

Nikkei/10

Figure 5-1: Daily closing prices of the Kyokuyo Company, its industry sector, capi-talization sector, and the broad Nikkei 225 index. Vertical line separates training and testing data for sector models.

A bit of notation will be needed to refer to these series in our models. Let us denote the price of the target stock (i.e. Kyokuyo) as P =^fp¹;p²;:::;pn^g. Similarly, we can denote the Nikkei index with N = ^fnt^g, the small cap index with S = ^fst^g, the medium cap index withM = ^fmt^g, the large cap index withL =^flt^g, and nally the sector index withT =^ftt^g. Also, we denote noise or residual series withator t, depending on whether we are assuming a normal distribution or not.

Linear models generated from this raw data tend to be misspecied, however, because of the temporal dependence in these series. This misspecication typically

102 ^CHAPTER^5. APPLICATION TWO:PRICE PREDICTION

shows up in the form of dependent residuals, a violation of standard regression as-sumptions. Following common practice for nancial time series, we transform the price series above by taking the rst dierence of the logarithm, i.e.

r^pt =log(pt)^;log(pt^;1) =log(pt=pt^;1): (5:1) If the stock in question does not pay dividends then r^p is the stock's continuously compounded return. In this chapter we will attempt to nd models of the target stock returns, and thus we will assume the data follows an equation of the form

rpt =f(rⁿ;r^s;r^m;r^l;r^t) +t (5:2) where f may use any past values of the input series. Our model then will be an estimate ^f of f, and we will assess the performance of the model by looking at the estimated residuals

^t=rpt^;r^pt (5:3)

where ^r^pt are the predictions of our model ^f on the given data set.

5.1.2 Performance Measures

In measuring the performance of our predictivenancial models, we like to distinguish between the performance of thestatistical model, and the performance of anytrading system we devise using the model. In many situations we create models that do not directly tell us what to do to make money, even though this may be our primary goal. The model may predict, for instance, that some stock's price will rise by 1%

tomorrow¹, but it doesn't directly tell us how much (if any) of the stock to buy. Thus we adopt two dierent sets of performance measures: one set to assess the model tness, and the second set to assess the protability of simple trading systems based

1Or perhaps even that the price will rise between 0.5% and 1.5% with probability 95%.

5.1. INDUSTRY SECTOR MODELS 103 on the model's predictions.

From the statistical side, most of the models will be t using some sort of sum squared error cost function, and thus it makes sense to rst look at a related error measure. We choose the usual multiple coecient of determination, dened by

R² = 100

1^; ^Xⁿ

i⁼¹^²i

= ^Xⁿ

i⁼¹(ri^p^;E[r^p])²

(5:4) However, when we test out of sample predictions we will be interested in both the mean and variance of the errors, and thus we also report the root mean squared prediction error dened by

RMSPE =^qE[^]²+ Var[^] (5:5)

where E[] and Var[] are the usual sample mean and variance operators.

From the nance side, we are interested in whether or not we can make money by using the statistical model to trade. A reasonable rst order trading strategy based on the above type of predictions simply buys a xed number of shares of the stock if the prediction is positive, and sells that same number of shares short if the prediction is negative. More complex trading strategies which vary the quantity and suggest \no change" are certainly possible (especially with pointwise prediction variance estimates), but our simple buy/sell strategy will serve for our evaluation purposes. One measure of the success of this simple strategy is to calculate the percentage of times that the model predicts the correct sign of the return². We also would like to know how much money we will make on the trading strategy, and for this we use the annual rate of return, which is dened as follows. Let ri denote the continuously compounded rate of return of our trading strategy in period i. Then

2Note that the \special case" of the return being zero is relatively common for short period returns, and we do not count such points as correct unless the model explicitly predicts zero.

104 ^CHAPTER^5. APPLICATION TWO:PRICE PREDICTION

our annual rate of return is

ARR = kn^X_i⁼¹ⁿ ri (5:6)

wheren is the numberof points in the entire test period, and k is the numberof trading time units per year (e.g. 52 for weekly trading, and roughly 253 for daily trading).

However, these returns may be surprisingly large (esp. for high frequency trading) due to the fact that we are not including any transaction costs in our evaluation.

Rather than attempting to nd some \reasonable" transaction costs to use in the above calculations, we instead report the break even transaction cost(BETC), which we dene as the percentage of the value of each transaction that would have to be charged for ARR to be zero for the entire test period. Note that we assume the transaction cost is paid only when the position is changed, not necessarily at every time period.

The nal measure of trading performance we use is the so-called Sharpe ratio, which attempts to normalize returns according to the risk of the trading strategy.

This measure is dened as

SHARPE = (E[R]^;E[R^f])=^qVar[R] (5:7) whereR is the simple return of our trading strategy (R = e^r^;1 for our continuously compounded return from above), and R^f is the simple rate of return for a risk free investment over the same period³. The Sharpe ratio can be shown to be the best measure to maximize for an isolated investment decision if CAPM style assumptions hold (see Bodie et.al 1989), and nonetheless is widely used even when the CAPM assumptions seem tenuous.

3In this chapter we use 3 month deposit rates converted to the appropriate holding period for

f, which should be negligibly dierent from the more textbook use of government insured bond rates for the exact same holding period as our investment.

5.1. INDUSTRY SECTOR MODELS 105

Dans le document A Radial Basis Function Approach to Financial Time Series Analysis (Page 100-105)