• Aucun résultat trouvé

Equivalence of Dierent Learning Networks

1.5 Other Approaches

nance perspective we don't care about theR2 or RMS error or some other statistical modeling measure of a model's performance: we care about how much money the system can make, or how reliably it can make money, thus we should measure our success in those terms. Furthermore, optimizing some performance measure of the trading system may imply the appropriateness of a dierent error measure for the modeling process (e.g. least absolute value instead of least squares).

Another issue we would like to explore concerns the non-stationary, time varying nature of nancial data. In a heuristic attempt to limit the eects of non-stationarity we will be tempted to avoid use of data that is too old. To get enough data for estimation, then, we will naturally try to exploit another dimension, such as looking at cross-sectional models (i.e. models that relate dierent securities to each other, rather than just to themselves).

Finally, we propose the usefulness of applying this modeling technology to other areas besides prediction. One example is deriving a monetary value of a source of data based on the best available trading system which uses it. Another example is nding persistent discrepancies between market prices and accepted pricing theories (e.g. for options).

1.5 Other Approaches

Nonlinear time series analysis is a relatively new area of investigation, and it has been approached from a variety of backgrounds. The statistics community pioneered it in the 1980's by proposing extensions to existing linear models (esp. the ARIMA models of Box & Jenkins (1976)), for instance combining two or more linear models in a simple nonlinear way (e.g. threshold autoregressive or TAR models of Tong and Lim (1980)).

For reviews of the numerous possibilities here see Priestley (1988), Tong (1990), or Granger (1991). These approaches are pleasing because of the scrutiny given in their development for the standard statistical considerations of model specication,

36 CHAPTER1. INTRODUCTION

estimation, and diagnosis, but their generally parametric nature tends to require signicant a priori knowledge of the form of relationship being modeled.

Independently in the late 1980's, the physics and dynamical systems community has constructed nonlinear state space models, motivated by the phenomena of chaos.

Crutcheld and MacNamara (1987) introduced a general method for estimating the

\equations of motion" (i.e. model of the time behavior) of a data set cast into the state space formulation, which included a novel measure of the usefulness of the model based on entropy. Farmer and Sidorowich (1989) make a case for breaking up the input domain into neighborhoods and approximating the function locally using simple techniques (e.g. linear or quadratic tting). Note that the work by this community addresses many interesting problems besides prediction, such as optimal sampling strategies, identifying the dimensionality of the system, identifying characteristics of the system (i.e. Lyapunov exponents) that determine how feasible it is to do long term prediction, and testing if a data set is nonlinear.

Many of the attempts from the dynamical systems area used RBFs as the function approximation method, although we note that these previous approaches restricted the RBF formulation given here in some way. Broomhead and Lowe (1988) applied RBFs to predicting the logistic map, and showed the usefulness of using fewer centers than data points. Casdagli (1989) investigated how error scaled with number of exam-ples for strict interpolation RBFs (i.e. data used as xed centers). Jones et.al. (1990) show how normalizing the basis functions and adapting the gradient uses the data more eciently for predicting the logistic map and Mackey-Glass equation. Kadirka-manathan et.al. (1991) give a method similar to RBFs where they incrementally add basis functions to the approximation as dictated by the distribution of error.

Simultaneously related attempts were being made in the \neural network" com-munity in the late 1980's, focusing more on practical applications and on the issue of prediction accuracy relative to other methods. Lapedes and Farber (1987) ap-plied multilayer perceptrons (MLP) to some of the same prediction problems

popu-1.5. OTHER APPROACHES 37 lar in the chaos community, the Logistic map and the Mackey-Glass equation, and found them to be superior to the Linear Predictive method and the Gabor, Wiener

& Volterra polynomial method, and comparable to the local linear maps of Farmer and Sidorowich. White (1988) used MLP to model univariate IBM stock returns, but found no signicant out of sample performance. The problem of overtting on the training data was noted by White and many other authors, and this inspired the wider use of sample reuse techniques from statistics, such as cross validation methods.

Utans and Moody (1991) clearly state the advantages of doing so, and also develop a new estimator of out of sample prediction error which penalizes for \eective" number of parameters of general nonlinear models. They also apply this measure using MLP networks to the problem of predicting corporate bond ratings and show superior per-formance over linear regression. Weigend (1991) developed a technique for penalizing extra parameters in an MLP network and show how the resulting parsimonious net-works outperform the corresponding TAR model. De Groot and Wurtz (1991) also nd evidence for the usefulness of MLP networks by comparing them with traditional statistical models such as linear, TAR, and bilinear models on univariate prediction problems. They also note the superiority of smarter parameter optimization meth-ods than gradient descent, and note that the Levenberg-Marquardt method worked best for their problems. Refenes (1992) proposed a method for incrementally adding units to the MLP paradigm and showed how his method outperformed linear ARMA models on predicting foreign exchange rates. Finally, a number of researchers have tried other more or less vanilla applications of MLP networks to nancial market prediction problems, but often the writeups of this work are plagued by insucient detail concerning critical aspects of their models (e.g. variable selection and prepro-cessing) and/or the performance measures quoted are not suciently explained or benchmarked (for example Kimoto et.al. (1990) or Wong and Tan (1992)).

38 CHAPTER1. INTRODUCTION