Essays on practical financial optimisation

(1)

Thesis

Reference

Essays on practical financial optimisation

SCHUMANN, Enrico

Abstract

Many optimisation problems in finance are difficult to solve because of multiple local optima or objective functions that are not well-behaved in other ways. The thesis comprises several essays on the application of optimisation heuristics to such problems. More specifically, we use methods like Differential Evolution, Particle Swarm Optimisation and Threshold Accepting to solve a selection of financial models. Examples that are discussed are portfolio construction, the calibration of option pricing models (eg, the Heston model) and yield curve models (the Nelson-Siegel-Svensson model), and robust/resistant regression (eg, Least Quantile of Squares and Least Trimmed Squares). In sum, we present evidence that heuristics perform extremely well on these problems and are thus ideal techniques for practical financial optimisation.

SCHUMANN, Enrico. Essays on practical financial optimisation. Thèse de doctorat : Univ.

Genève, 2010, no. SES 731

URN : urn:nbn:ch:unige-149946

DOI : 10.13097/archive-ouverte/unige:14994

Available at:

http://archive-ouverte.unige.ch/unige:14994

Disclaimer: layout of this document may differ from the published version.

(2)

ESSAYS ON PRACTICAL FINANCIAL OPTIMISATION

Thèse présentée à la Faculté des sciences économiques et sociales de l’Université de Genève

Par Enrico Schumann

pour l’obtention du grade de

Docteur ès sciences économiques et sociales mention :économétrie

Membres du jury de thèse :

M. Manfred Gilli, co-directeur de thèse M. Olivier Scaillet, co-directeur de thèse M. Henri Loubergé, président du jury M. Dietmar Maringer, Université de Bâle

Thèse N° 731

Genève, le 15 juillet 2010

(3)

La Faculté des sciences économiques et sociales, sur préavis du jury, a autorisé l’impression de la présente thèse, sans entendre, par là, n’émettre aucune opinion sur les propositions qui s’y trouvent énoncées et qui n’engagent que la responsabilité de leur auteur.

Genève, le 2 septembre 2010

Le doyen

Bernard MORARD

Impression d’après le manuscrit de l’auteur

(4)

Essays ^on

Practical Financial Optimisation

Enrico Schumann Université de Genève

15th June2010

(5)

(6)

overview

Problems & Methods

Optimisation Problems in Applied Finance · 3 Heuristic Optimisation Methods · 17

Portfolio Optimisation

A Practical Guide to Threshold Accepting · 33

Large-Scale Portfolio Optimisation with Heuristics · 53 Distributed Optimisation of a Portfolio’s Omega · 81

An Empirical Analysis of Portfolio Optimisation Models · 95 130/30Portfolio Construction with the Omega Ratio · 139 Optimal Enough? · 157

O ption P ricing

Calibrating Option Pricing Models with Heuristics · 173

Estimation

Robust Regression with Optimisation Heuristics ·207 Calibrating the Nelson–Siegel–Svensson Model ·231

Literature & Index

Bibliography · 249 Index ·267

(7)

(8)

preface

This thesis has grown out of my work as a fellow of the Marie Curie Research- and Training-Network ‘Computational Optimisation Methods in Statistics, Eco- nometrics, and Finance’ (comisef) at the département d’économétrie of the Uni- versité de Genève. Financial support came from the eu Commission through mrtn-ct-2006-034270 comisef and is gratefully acknowledged.

There are many people who helped me along the way. I am deeply obliged to them all; here I shall name but a few. First and foremost, there is Manfred Gilli. Manfred taught me much about optimisation, programming, numerical methods, and typography; but most importantly, he taught me a way of working systematically and with care for detail that I had never seen before. Then, there are Peter Winker and Dietmar Maringer. I would never have landed in Geneva without them. Without Peter, there would have been no comisef, in any case I would never have heard of it. Without Dietmar and his lectures in Erfurt, I would not have known that there is such a subject as computational finance, or rather a branch of it that emphasises practical application and to which I felt strongly attracted.

I would also like to thank Henri Loubergé and Olivier Scaillet for helpful comments and suggestions.

how to read the thesis

All chapters are based on stand-alone manuscripts, several of them are also published. Therefore, there will be redundancies which possibly could have been eliminated from the thesis. Still, I decided to leave them, so to make the thesis readable in parts.

All chapters were co-authored by Manfred Gilli, hence I will use the ‘we’

throughout (it’s not a pluralis modestiae). Additionally, Chapter 7 builds on work with Giacomo di Tollo and Gerda Cabej; Chapter11 describes work done with Stefan Große. Many thanks go to them as well.

on software

All optimisation algorithms described in this thesis were coded inMatlab, some also inR.

The text itself was set in L^ATEX with Peter Wilson’s wonderfulmemoir-class.

The layout is evidently inspired by Bringhurst (2005).

e.s.

(9)

(10)

There is a further difficulty with the finding of ‘best’ solutions. All too frequently when a ‘best’ solution to a problem has been found, someone comes along and finds a still better solution simply by pointing out the existence of a hitherto unsuspected variable. In my experience when a moderately good solution to a problem has been found, it is seldom worth while to spend much time trying to convert this into the ‘best’ solution.

The time is much better spent in real research. . .

— George Kimball, co-founder of the Operations Research Society of America (cited in Tukey, 1962)

Thinking that our method might be useful to other people, we submitted it to a journal, where it was speedily rejected as too subjective. Our only claim was that the method sometimes works on problems where all else has failed, problems describable by the medical euphemism ‘terminal.’ [. . . ] Direct search is a crude, brute force method having no mathematical elegance. Its popularity in terms of citations shows that there are more terminal problems around than one might think. Among real nonlinear multivariate problems, those that are solvable analytically or by socially acceptable numerical methods seem to constitute a set of measure zero.

The numerous requests we’ve received for reprints suggest that for some time there will still be a demand for inelegant methods that can be tried on terminal cases.

—Robert Hooke and T.A. Jeeves (from Hooke and Jeeves, 1980)

(11)

(12)

Part 1 Problems & Methods

(13)

(14)

optimisation problems in applied finance

1

This chapter is based on Gilli and

Schumann (2009b), and Gilli and Schumann (2010c).

This chapter discusses the process of creating optimisation models: how models are set up, how they are solved, and how solutions are evaluated.

We give several example problems from theoretical and practical finance to motivate the use of heuristic optimisation techniques.

Financial economics is essentially concerned with two questions: how much to save; and how to save, that is, how to invest income not consumed (Constan- tinides and Malliaris, 1995). Traditionally, economists have formulated both these questions as optimisation problems (Dixit,1990); the second question has, however, received much greater attention in applied finance, and here the optimisation models have come to be deployed in practice.

In this first chapter, we discuss financial optimisation models more generally;

specifically we discuss the setting up of models, how they are solved, and how obtained solutions are evaluated. The chapter is concluded by several examples of financial optimisation models.

What to optimise?

An optimisation model consists of an objective function and constraints. For all the applications discussed in this thesis, the objective function is scalar-valued;

it takes as arguments a vector of decision variables, and data. The model specification will be determined by considering and balancing different aspects:

financial The straightforward part. We define goals and the means to achieve them; and we make these notions precise. For an asset allocation model, for instance, we may need to decide how to define risk – eg, variability of returns, average losses, probability of failing to meet an obligation – or how to measure portfolio turnover.

empirical In finance, we are rarely interested in the past. Even though we should be. Reading Charles Kindleberger helps.

All models deal with future – and hence unknown – quantities. So we need to forecast, estimate, simulate, or approximate these quantities. Building the model (the finance part) must not be separated from the empirical part; we may only deal with quantities that we can forecast (etc.) to a sufficient degree. This is a strong, yet unavoidable, constraint on formulating the model.

computational There is second constraint: the model must be solvable. Com- putational aspects will be the main theme of this thesis. We will argue that they are much less of a constraint than is sometimes thought.

We can think of model building as a meta-optimisation in which we try to obtain the best possible results (or more modestly, good results; results that improve

(15)

the current status) for our financial goals under the restrictions that the model remains empirically meaningful and can be solved.

We will briefly discuss the ingredients of a model here. The objective function is often given by the problem at hand. In asset allocation, for instance, we may want a high return and low risk; when calibrating a model, we want to choose parameter values so that the output of the model is ‘close’ to observed quantities. Of course, these descriptions need to be made more precise. There are many ways to define what ‘close’ means. We have to select, say, a specific norm of absolute or relative distances. In fact, many problems can be spelled out in different ways. When we estimate interest rate models, we may look at interest rates, but also at bond prices. If we look at options, we may work with prices, but also with implied volatilities. Bond prices are functions of interest rates; there is a bijective relationship between option prices and Black–Scholes–

Merton implied volatilities. Conceptually, different model formulations may be equivalent; numerically and especially empirically they are often not. Specific choices can make a difference, and they often do.

See for instance the discussion in Christoffersen and Jacobs (2004); Bams et al. (2009) regarding the calibration of option pricing models.

We can phrase this more to the point: can we directly optimise the quantities that we are interested in? The answer is No; at least, Not Always. A well-known example comes from portfolio selection. If we wanted to maximise return, we should not write it without safeguards into our objective function. The reason is that (i) we cannot predict future returns,

Black (1986) argues that expected returns are so difficult to forecast or estimate that they resemble unobservable quantities.

and (ii) there is a cost involved in failing to correctly predict: the chosen portfolio performs poorly. Theory may help, but determining a good objective function – one that serves our financial goals – ultimately is an empirical task.

Most realistic problems have constraints. Constrained optimisation is usually more difficult than the unconstrained case. Constraints may, like the objective function, be given by the problem at hand: in asset allocation we may have legal or regulatory restrictions on how to invest. Empirically, restrictions often have another purpose. They act as safeguards against optimisation results that follow from our specific data, rather than from the data-generating process. This can concern out-of-sample performance, but also how we can interpret estimated parameters. For example, variances cannot be negative, and probabilities must lie in the range[0, 1]. Yet when such quantities enter as parameters in a numerical procedure, we are not guaranteed that these restrictions are observed, and so we need to make sure that our algorithms yield meaningful results.

Solving the model

The topic of this thesis are neither financial nor empirical aspects of optimisation models, but their numerical solution. We describe optimisation models that cannot be solved with standard methods, ie, models that pose difficult optimisation problems. This is not meant presumptuously; it simply reflects the fact that many problems in finance are difficult to solve. Heuristic methods which we will describe below can at least help to obtain good solutions, as will be

(16)

demonstrated in later chapters. A clarification is needed here: in optimisation theory, the solution of a model is the optimum; thus it is not necessary to speak of optimal solutions. But that is not the case in practical applications. A solution here is rather the result obtained from a computer program. The quality of this solution will depend on the interplay between the problem and the chosen method (and chance).

From a practical perspective, difficulty in solving a problem can be measured by the amount of computational resources required to compute a (good) solution.

In computer science, there are the fields of complexity theory and analysis of algorithms that deal with the difficulty of problems, but results are often of limited use in applications.

The practical efficiency of algorithms depends strongly on their implementation;

sometimes minor details can make huge differences.

Often the only way to obtain results is to run computational experiments.

What makes a problem difficult? For combinatorial models it is the problem size. Such problems have an exact solution method – exhaustive enumeration of all possible solutions – but this approach is almost never feasible for realistic problem sizes. For continuous problems, difficulties arise when:

· The objective function is not smooth (eg, has discontinuities), or is noisy.

In either case, relying on the gradient to determine search directions may fail. An example is a function that needs to be evaluated for given arguments by a stochastic simulation, or by another numerical procedure (eg, a quadrature or a finite-difference scheme).

· The objective function has many local optima.

Practically, even in the continuous case we could apply complete enumeration:

we could discretise the domain of the objective function and run a so-called grid search. But it is easy to see that this approach, just like for combinatorial problems, is not feasible in practice once the dimensionality of our model grows.

To solve optimisation models, many researchers and practitioners rely on what we call here classical techniques. Classical methods are, for the purpose of this thesis, defined as methods that require convexity or at least well-behaved objective functions as they are often based on exploiting derivatives information. They are mathematically well-founded; numerically, there are powerful solvers available which can efficiently tackle even large-scale instances of given problems. Methods that belong to this approach are for instance linear and qua- dratic programming. The efficiency and elegance of these methods comes at a cost, though, since considerable constraints are put on the problem formulation, ie, the functional form of the optimisation criterion and the constraints. We often have to shape the problem in a way that it can be solved by such methods.

Thus, the answer that the final model provides is a precise one, but often only to an approximative question.

An alternative approach that we will describe in this thesis is the use of heuristic optimisation techniques. Heuristics are a relatively new development in optimisation theory. Even though early examples date back to the1950s or so, these methods have become practically relevant only in recent decades with the enormous growth in computing power. Heuristics aim at providing good and fast approximations to optimal solutions; the underlying theme of heuristics may thus be described as seeking approximative answers to exact questions.

(17)

Heuristics have been shown to work well for problems that are completely infeasible for classical approaches (Michalewicz and Fogel,2004). They are conceptually often very simple; implementing them rarely requires high levels of mathematical sophistication or programming skills. Heuristics are flexible: we can easily add, remove or change constraints, or modify the objective function.

These advantages come at a cost as well, as the obtained solution is only a stochastic approximation, a random variable. However, such a stochastic solution may still be better than a poor deterministic one (which, even worse, we may not even recognise as such) or no solution at all when classical methods cannot be applied. In fact, for many practical purposes, the goal of optimisation is probably far more modest than to find the truly best solution. Rather, any good solution, where good means an improvement of the status quo, is appreciated.

Heuristics are not better optimisation techniques than classical methods; the question rather is when to use what kind of method. If classical techniques can be applied, heuristic methods will practically always be less efficient. When, however, given problems do not fulfil the requirements of classical methods (and the number of such problems seems large), the suggestion made in this thesis is to not tailor the problem to the available optimisation technique, but to choose an alternative, heuristic, technique for optimisation. In Chapter 2 we give a selective overview of heuristic techniques; later chapters detail the implementation of these methods.

Evaluating solutions

In setting up and solving an optimisation model, we necessarily commit a number of approximation errors.

A classic reference on the analysis of such errors is von Neumann and Goldstine (1947). See also the discussion in Morgenstern (1963, ch.6).

The term error does not mean that something went wrong; these errors are going to occur even if all procedures work as in- tended. The first such error comes when we translate the real problem into a model. For instance, we may move from actual prices in actual time to a mathematical description of the world, where both prices and time are continuous (ie, infinitely-small steps are possible). Such a model, if it is to be empirically meaningful, needs a link to the real world, which comes in the form of data, or parameters that have to be estimated. Again, we have a likely source of error, for the available data may or may not well reflect the true, unobservable process.

When we solve such models on a computer we can only approximate a solution.

See Heath (2005) for an introduction to numerical analysis.

A brief and highly-readable overview is Trefethen (2008b).

At the lowest level, errors come with the mere representation of numbers.

A computer can only represent a finite set of numbers exactly; any other number has to be rounded to the closest representable number, thus we have so-called roundoff error. Many functions (eg, the logarithm) cannot be computed exactly on a computer, but need be approximated. Operations like differentiation or integration, in mathematical formulation, require a ‘going to the limit’, ie, we let numbers tend to zero or infinity. But that is not possible on a computer, any quantity must stay finite. Hence, we incur truncation error. For optimisation models, we may incur a similar error: some algorithms, in particular the me-

(18)

thods that we describe in this thesis, are stochastic, hence we do not – in finite time – obtain the model’s exact solution, but only an approximation.

In sum, we can roughly divide our modelling into two steps: from reality to the model, and then from the model to its numerical solution. Unfortunately, large parts of the computational finance literature seem only concerned with assessing the quality of the second step, from model to implementation, and attempt to improve here. In the past, a certain division of labour may have been necessary: the economist created his model, and the computer engineer put it into numerical form. But today, there is little distinction left between the researcher who creates the model, and the numerical analyst who implements and solves it. Modern computing power allows us to solve incredibly-complex

models on our desktops. John von Neumann

and Herman Goldstine, in the above-cited paper, describe the inversion of ‘large’

matrices where large meantn>_{10. In a} footnote, they

‘anticipate that n∼¹⁰⁰will become manageable’ (fn.12).

Today,Matlabinverts a100×100matrix on a normal desktoppc in¹/1000of a second.

But then of course, the responsibility to check the reasonableness of the model and its solution lies – at all approximation steps – with the financial engineer, and then only evaluating problems at the implementation step falls short of what is required: any error in this step should be set into context, we need to compare it with the error introduced in the first step, when setting up the model. But such an evaluation is, even conceptually, much more difficult.

A practical approach is to compare different models of which some are ‘more accurate’ than others. Accurate means that a model cannot only be solved with sufficient precision, but that it is also economically meaningful. Suppose then that we have two models that serve the same purpose. One model can be solved precisely, but it is less-accurate than a second model, which can be only solved approximatively. We can still empirically test whether an only moderately-good solution to the more-accurate model provides a better answer to our real problem than the precise solution to the less-accurate model. Again, an example from portfolio selection can illustrate this point. Markowitz (1959, ch.9) com- pares two risk measures, variance and semi-variance, along the dimensions cost, convenience, familiarity, and desirability; he concludes that variance is superior in terms of cost, convenience, familiarity. For variance we can compute exact solutions to the portfolio selection problem; for semi-variance we can only approximate the solution. We can now empirically test whether, even with an inexact solution for semi-variance, the gains in desirability outweigh the increased effort. (It seems the answer is Yes; see Chapter6.)

Even if we accepted a model as ‘true’, the quality of the model’s solution would be limited by the attainable quality of the model’s inputs, ie, the data or the estimated parameters. Appreciating these limits helps to decide how precise a solution we actually need. This decision is relevant for many problems in financial engineering since we generally face a trade-off between the precision of a solution and the effort required (most visibly, computing time). Surely, the numerical precision with which we solve a model matters; we need reliable methods. Yet, empirically, there must be an adequate-precision threshold for any given problem. Any improvement beyond this level cannot translate into gains

(19)

regarding the actual problem any more; only in costs (increased computing time or development costs). Given the rather low empirical accuracy of many financial models, this required precision cannot be high. More specifically, when it comes to optimisation we can decide whether we actually need an exact solution as promised by the application of classical methods, or whether a ‘good’

solution provided by a heuristic is enough.

In principle, of course, there would seem little cost in computing precise solutions. Yet there is. Firstly, highly-precise or ‘exact’ solutions give a sense of certainty which is never justified by a model; secondly, computing more-precise solutions will require more resources. This does not just mean more computing time, but also more time to develop a particular method. Grinold and Kahn (2008, pp.284–285) give an example; they describe the implementation of algorithms for an asset selection problem – finding the portfolio with cardinality50 that best tracks thes&p500. This is a combinatorial problem for which an exact solution could never be computed (there are about 10⁷⁰ possible portfolios), hence we need an approximation. A specialised algorithm took six months to be developed, but then delivered an approximative solution within seconds. As an alternative a heuristic technique, a genetic algorithm, was tested. Implemen- tation took two days; the algorithm found similar results, but also needed two days of computing time. A remark is in order: the example is from the 1990s.

Today the computing time of a genetic algorithm for such a problem would on a standard pc be of the order of minutes, more likely seconds (the Threshold Accepting algorithm employed in Chapter 6, for instance, computes a solution to a similar portfolio selection problem in less than 10 seconds). Researchers may have become cleverer since the1990s, ie, they may also develop models fas- ter today; but it is unlikely that their performance-improvement matches that of computer technology.

A final consideration of the quality of a solution is the distinction between in-sample and out-of-sample. For financial models, this distinction is far more relevant than numerical issues. More precise solutions will by definition ap- pear better in-sample, but we need to test if this superiority is preserved out-of- sample. Given the difficulty we have to predict or estimate financial quantities, it seems unlikely that highly-precise solutions are necessary in financial models once we compare out-of-sample results. Again, we can set up an empirical test here. Suppose we have a model for which we can compute solutions with va- rying degrees of precision. Each such solution can be evaluated by its in-sample objective function value (its in-sample fit), but each solution also maps into a certain out-of-sample quality. We can now sort our solutions by in-sample precision, and then see if out-of-sample quality is a roughly monotonous function of this precision, and if the slope of this function is economically meaningful.

(Such a test is demonstrated in Chapter 8 in the context of portfolio optimisation.)

To sum up this discussion: the numerical precision with which we solve a

(20)

model matters, and in-sample tests can show whether our optimisation routines work properly. But the more relevant question is to what extent this precision translates into economically meaningful solutions to our actual problems. In the remaining part of this chapter, several examples for optimisation problems that cannot be solved with classical methods will be given. This is not a survey, but a selection of problems to illustrate and motivate the use of heuristic methods in finance. Some of these problems will be revisited in later chapters. For more detailed studies, see for example Maringer (2005). Many references to specific applications can be found it Schlottmann and Seese (2004) or Gilli et al. (2008a).

In the next chapter we will give a brief introduction to heuristic methods. The emphasis in both these initial chapters will be on principles, not on details.

Heuristic methods as presented in these chapters may be regarded as recipes rather than specific algorithms.

Some authors prefer the notion of meta-heuristics for the concepts underlying these techniques, whereas only the specific implementations are called heuristics.

Later chapters will demonstrate the process of implementing such recipes.

1.1 examples

Portfolio optimisation with alternative risk measures

In the framework of modern portfolio optimisation (Markowitz, 1952, 1959), a portfolio of assets is completely characterised by a desired property, the reward, and something undesirable, the risk. Markowitz identified these two properties with the expectation and the variance of returns, respectively, hence the expres- sion mean–variance optimisation (mvo). There exists by now a large body of evidence that financial asset returns are not normally distributed (see for instance Cont, 2001), thus describing a portfolio by only its first two moments is often regarded as insufficient. Alternatives to mvo have been proposed, in particular replacing variance as the risk measure.

Assume an investor has wealthv0and wishes to invest for one period. A given portfolio, as it comprises risky assets, maps into a distribution of wealth at the end of the period, or equivalently into a distribution of losses ℓ_{. The} optimisation problem can be stated as follows

min_x Φ ℓ(x) E(ℓ)≤ −v0rd

x^inf_j ≤^xj ≤ ^x^sup_j ^j∈ J Kinf ≤^#{J } ≤Ksup

. . .

The objective function Φ(ℓ) could be a risk measure or a combination of multiple objectives to be minimised. Candidates include the portfolio’s drawdown, partial moments, or whatever the analyst wishes to optimise.

Though we do not pursue this

possibility here, the techniques discussed in later chapters can also be used for utility optimisation, see for example Maringer (2008).

The vectorxstores the (integer) numbers of assets held; rd is the desired return. x^inf_j and x^sup_j are vectors of minimum and maximum holding sizes, respectively, for those assets

(21)

included in the portfolio (ie, those in the set J). If short-sales are allowed, this constraint is modified to x^inf_j ≤ |^xj| ≤ ^x^sup_j ^. ^Kinf and Ksup are cardinality constraints which set a minimum and maximum number of assets to be in J^. There may be restrictions on transaction costs (in any functional form) or turnover, lot size constraints (ie, restrictions on the multiples of assets that can be traded), and exposure limits. We may also add constraints that, under certain market conditions, the portfolio needs to behave in a certain way (usually give a required minimum return).

Similar to this framework are index-tracking problems; here investors try to replicate a pre-defined benchmark; see for example Gilli and Këllezi (2002).

This benchmark need not be a passive equity index. In the last few years, for instance, there have been attempts to replicate the returns of hedge funds, see Lo (2008).

Applying alternative risk measures generally necessitates using the empirical distribution of returns. (There is little advantage in minimising kurtosis when stock returns are modelled by a Brownian Motion.) The resulting optimisation problem cannot be solved with classical methods (except for specical cases like mvo). To give an example, Figure 1.1 shows the search space, that is, the values of the objective function that particular solutions map into, for a problem where Φ is the portfolio’s Value-at-Risk. The resulting surface is not convex and not smooth. Any search that requires a globally convex model, like a gradient-based method, will stop at the first local minimum encountered, if it arrives there at all.

0 0.02 0.04 0.06 0.08 0.1

0 0.02 0.04 0.06 0.08 0.19 9.1 9.2 9.3 9.4 9.5 9.6 9.7 9.8 9.9

x 10 −3

weight of asset 1 weight of asset 2

Φ

Figure1.1: Objective function for Value-at-Risk.

For some objective functions, the optimisation problem can be reformulated to be solved with classical methods; examples are Gaivoronski and Pflug (2005)

(22)

or Rockafellar and Uryasev (2000); Chekhlov et al. (2005). Unfortunately, such solutions are problem-specific and do not accommodate changes in the model formulation. How to use heuristics for portfolio selection will be discussed more thoroughly in Chapters3to8.

Model selection

Linear regression is a widely-used technique in finance. A common application are factor models where the returns of single assets are described as functions of other variables. Then

r=^h f1 · · · ^fk

i



 β1

...

β_k



+ǫ (1.1)

wherer is a vector of returns for a given asset, f are the vectors of factor reali- sations,β are the factor loadings andǫ contains the remaining variation. Such models are widely applied for instance to construct variance–covariance matrices or in attempts to forecast future returns. The factors f may be macroeco- nomic quantities or firm specific characteristics; alternatively, the analyst may use statistical factors, for instance extracted by principal component analysis.

In practice, observable factors are often preferred, in particular since they are easier to interpret and to explain to clients. Given the vast amounts of financial data available, these factors may have to be picked from hundreds or thousands of available variables, in particular since we may also consider lagged variables (Maringer, 2004). Model selection becomes a critical issue, as we often wish to use an only small numberk of regressors from Kpossible ones, where K ≫ ^k^. We could use an information criterion, which penalises additional regressors, as the objective function; alternatively, techniques like cross validation can be applied or the problem can be formulated as an in-sample fit maximisation under the restriction thatkis smaller than a small fixed number.

Robust/Resistant regression

Empirical evidence over the last decades has shown that the Capital Asset Pri- cing Model (capm) explains asset returns in the cross-section rather badly (Fama and French,1993,2004). However, when interpreting the capm as a one-factor model (Luenberger,1998, ch.8), the β-estimates become useful measures of a stock’s general correlation with the market, which may be used to construct variance–covariance matrices (Chan et al.,1999).

The standard method to obtain parameter estimates in a linear regression is Least Squares. Least Squares (ls) has appealing theoretical and practical (numerical) properties, but obtained estimates are often unstable in the presence of extreme observations which are common in financial time series (Chan and Lakonishok, 1992; Knez and Ready, 1997; Genton and Ronchetti, 2008). Some

(23)

earlier contributions in the finance literature suggested some form of shrinkage of extreme β-estimates towards more reasonable levels, with different theoreti- cal justifications (see for example Blume,1971, or Vasicek,1973). Alternatively, the use of robust or resistant estimation methods to obtain the regression parameters has been proposed (Chan and Lakonishok, 1992; Martin and Simin, 2003). Among possible regression criteria, high breakdown point estimators are often regarded as desirable. The breakdown point of an estimator is the smal- lest percentage of contaminated (outlying) data that may cause the estimator to be affected by a bias. The Least Median of Squares (lms) estimator, suggested by Rousseeuw (1984), ranks highly in this regard, as its breakdown point is50%.

(Note that lsmay equivalently be called Least Mean of Squares.)

There is of course a conceptual question as to what constitutes an outlier in financial time series. Markets may well produce extreme returns, and disregar- ding such returns by dropping or winsorising them may mean throwing away information. Errors in the data, though, for example stock splits that have not been accounted for, are clearly outliers. Such data errors seem to occur on a wide scale, even when using commercial data providers (Ince and Porter,2006).

In particular when large amounts of data are processed automatically, resistant regression techniques may be advisable.

Unfortunately, lmsregression leads to non-convex optimisation models, see Chapter10. A particular search space for the simple modely = β1+β2x+ǫis shown in Figure1.2.

−0.5

0

0.5

−0.5 0

0.51 2 3 4

x 10 −4

β1

β2

med(r)

Figure1.2: lms objective function.

(24)

Agent-based models

Agent-based models (abm) abandon the attempt to model markets and financial decisions with one representative agent (Kirman,1992). This results in models that quickly become analytically intractable, hence researchers rely on computer simulations to obtain results. abm are capable of producing many of the ‘stylised facts’ actually observed in financial markets like volatility clustering, jumps or fat tails. For overviews on abm in finance, see for example LeBaron (2000, 2006).

Unfortunately, the conclusion of many studies stops at asserting that these models can in principle produce realistic market behaviour when parameters (like preferences of agents) are specified appropriately. This leads to the question what appropriate values should be like, and how different models compare with one another when it comes to explaining market facts.

Gilli and Winker (2003) suggest to estimate the parameters of such models by indirect inference. This requires an auxiliary model that can easily be estimated, which in their case is simply a combination of several moments of the actual price data. A given set of parameters for the abm is evaluated by measuring the distance between the average realised moments of the simulated series and the moments obtained from real data. This distance is then to be minimised

by adjusting the parameters of the abm. See Winker et al.

(2007) for a more detailed analysis of objective functions for such problems.

Figure1.3 shows the resulting search space for a particular abm (see Kirman,1993). The objective function does not seem too irregular at all, but since the function was evaluated by a stochastic simulation of the model, it is noisy and does not allow for the application of classical methods.

0

0.005

0.01

0.2 0.25 0.3 0.35 0.40 0.5

1 1.5

epsilon Min: S(6.897e−004, 4.000e−001) = 4.002e−001

delta

Figure1.3: Simulated objective function for Kirman’s model for two parameters.

(25)

Calibration of option pricing models

Prices of options and other derivatives are modelled as functions of the underlying securities’ characteristics (Madan,2001). Parameters for such models are often inferred by solving inverse problems, that is, we try to obtain parameter values for which the model gives prices that are close to actual market prices. In case of the Black–Scholes–Merton model only one parameter, volatility, needs to be specified which can be done efficiently with Newton’s method (Manaster and Koehler, 1982). More recent option pricing models (see for instance Bak- shi et al.,1997, or Bates, 2003) aim to generate prices that are consistent with the empirically observed implied volatility surface (Cont and da Fonseca,2002).

Calibrating these models requires to set more parameters, which leads to more difficult optimisation problems.

One particular pricing model is the Heston model (Heston, 1993) which is much used as it gives closed-form solutions

‘Closed-form’ is somewhat deceiving:

pricing still requires numerical integration of a complex-valued function.

for option prices. Under the Heston model the stock price (S) and variance (V) dynamics are described by

dSt=rStdt+√

VtStdW_t¹ dV_t=κ(θ−^Vt)dt+σ√

VtdW_t²

where the two Brownian motion processes are correlated, that is, dW_t¹dW_t² = ρdt. As can be seen from the second equation, volatility is mean-reverting in the Heston model. In total, the model requires (under the risk-neutral measure) the specification of five parameters (Mikhailov and Nögel,2003). Even though some of these parameters could be estimated from the time series of the underlying, the general approach to fit the model is to minimise the squared difference between the theoretical and observed prices. A possible objective function is hence

min

∑

^N

n=1

wn(C_n^H−C_n^M)²

whereNis the number of option quotes available,C^H andC^M are the theoretical and actual option prices, respectively, and w are weights (Hamida and Cont, 2005). Sometimes the optimisation model also includes parameter restrictions, for example to enforce the parameters to be such that the volatility cannot become negative.

Figure1.4shows the resulting objective function values for two parameters (volatility of volatility and mean-reversion speed) with the remaining parameters fixed. As can be seen, in certain parts of the parameter domain the resulting objective function is not too well-behaved, hence standard methods may not find the global minimum. The Heston model is discussed in Chapter9.

Calibration of yield structure models

The model of Nelson and Siegel (1987) and its extended version, introduced by Svensson (1994), are widely used to approximate the term structure of interest

(26)

0

1

2

3

4

0 2 4 6 8 100 0.5 1 1.5 2 2.5 3

x 10 7

vol of vol mean reversion

Figure1.4: Heston model objective function.

0

2

4

6

2 0 6 4

10 8 100 150 200 250 300 350

λ₁ λ₂

sum of squared residuals

Figure1.5: nss model objective function.

rates. Many central banks use the models to represent the spot and forward rates as functions of time to maturity; in several studies (eg, Diebold and Li, 2006) the models have also been used for forecasting interest rates.

Letyt(τ)be the yield of a zero-coupon bond with maturityτat timet, then the Nelson–Siegel model describes the zero rates as

yt(τ) =β_1,t+β_2,t

1−^exp(−^γt) γt

+β_3,t

1−^exp(−^γt)

γt −^exp(−^γt)

(1.2)

(27)

whereγt =^τ/λt. The Svensson version is given by yt(τ) =β_1,t+β_2,t

1−^exp(−^γ1,t) γ1,t

+ (1.3)

β3,t

1−^exp(−^γ1,t)

γ_1,t −^exp(−^γ1,t)

+β4,t

1−^exp(−^γ2,t)

γ2,t −^exp(−^γ2,t)

where γ1,t = ^τ/λ1,t and γ2,t = ^τ/λ2,t. The parameters of the models (β and λ) can be estimated by minimising the difference between the model ratesyt and observed ratesy^M_t where the superscript stands for ‘market’. An optimisation problem could be stated as

minβ,λ

∑

yt−y^M_t2

subject to constraints. We need to estimate four parameters for model (1.2), and six for model (1.3). Again, this optimisation problem is not convex; an example for a search space is given in Figure1.5. The calibration of the Nelson–Siegel model and Svensson’s extension is discussed in more detail in Chapter11.

(28)

heuristic optimisation methods

2

This chapter is based on Gilli and

Schumann (2009b).

We describe the principles of heuristic optimisation and how heuristics dif- fer from classical optimisation methods. We briefly describe and provide pseudocode for several well-known heuristics.

In this chapter, we will outline the basic principles of heuristic methods, and summarise several well-known methods. These descriptions are not meant as definite references (all the presented techniques exist in countless variations), but are to give the basic rules by which these methods operate.

The term heuristic is used in many scientific fields for different, though often related, purposes. In mathematics, it is used for derivations that are not provable, sometimes even incorrect, but lead to correct conclusions nonetheless.

The term was made famous here by George Pólya (1957). Psychologists use the word heuristics for simple rules of thumb for decision making. The term acqui- red a negative connotation through the works of Daniel Kahneman and Amos Tversky in the 1970s, since their heuristics-and-biases programme involved a number of experiments that showed the apparent suboptimalitiy of such simple decision rules (Tversky and Kahneman,1974). More recently, however, an alternative interpretation of these results has been advanced, eg, Gigerenzer (2004a, 2008). Many studies indicate that while simple rules underperform in stylised settings, they yield often surprisingly good results in more realistic situations, in particular in the presence of estimation error.

Substantial strands of literature in different disciplines document the good performance of simple methods when it comes to prediction, and judgement and decision making under uncertainty.

Points in case are forecasting (see Makridakis et al., 1979, cf. in particular them-competitions (Makridakis and Hibon,2000), Goldstein and Gigerenzer,2009), econometrics (see Armstrong,1978), psychology and decision analysis (Dawes,1979,1994, Lovie and Lovie, 1986). Still, within its respective discipline, each of these strands represents a niche.

The term heuristic is also used in computer science; Pearl (1984, p.3) describes heuristics as methods or rules for decision making that are (i) simple, and (ii) give good results sufficiently often.

Among computer scientists, heuristics are often related to research in artificial intelligence; here sometimes specific methods like Genetic Algorithms are put on a level with, say, Neural Networks in the sense of being computational archi- tectures that solve problems (see for instance Schlottmann and Seese,2004). In this thesis, we will define heuristics in a narrower sense. We will always stay in the framework of optimisation models, thus our basic problem will be

minimise

x f(x, data)

where f is a scalar-valued function, and x is a vector of decision variables. In most cases, this optimisation problem will be constrained. Heuristics, in the sense the term is used in this thesis, are a class of numerical methods that can solve such problems. Following similar definitions in Zanakis and Evans (1981), Barr et al. (1995) and Winker and Maringer (2007), we characterise the term optimisation heuristic through several criteria:

· The method method should give a ‘good’ stochastic approximation of the true optimum; ‘goodness’ can be measured in computing time and solution quality.

(29)

· The method should be robust to changes in the given problem’s objective function and constraints. Furthermore, results should not vary too much with changes in the parameter settings of the heuristic.

· The technique should be easy to implement.

· Implementation and application of the technique should not require subjective elements.

Such a definition is not unambiguous. Even in the optimisation literature we find different characterisations of the term. In Operations Research, heuristics are often not regarded as stand-alone methods but as workarounds for problems in which ‘real’ techniques like linear programming do not work satisfactorily, see for instance Hillier (1983); or the term is used for ad hoc adjustments to methods that seem to work well but whose advantage cannot be proved mathematically. We will not follow this conception. Heuristics as defined here are general-purpose methods that can, as we will show, handle problems that are sometimes completely infeasible for classical approaches. Still, it is interesting to note that even though there exists considerable evidence of the good performance of heuristics, they are still not widely applied in research and practice.

Brandimarte (2006), a well-known textbook on financial optimisation, for example devotes about8pages, of a total of more than 600pages, to describe heuristic methods.

2.1 direct search and local search

Heuristics can be be divided into local search methods and constructive methods (Gilli and Winker,2009). For local search methods, the algorithm moves from solution to solution, ie, a complete existing solution is changed to obtain a new solution. The term local should not be taken too literally, as some methods (eg, Genetic Algorithms) are discontinuous in their creation of new solutions.

Hence a new solution may be very different from its predecessor; it will, however, usually share some characteristics with it. Constructive methods on the other hand build new solutions in a stepwise procedure: an algorithm starts with an empty solution and adds components iteratively. An example for this approach comes from the Travelling Salesman Problem. Here solution methods exist where we start with one city and then add the remaining cities one at a time until a complete tour is created. Thus, the procedure terminates once we have found one complete solution.

In this thesis, we will only consider local search methods. All the techniques discussed are iterative: we start with one or several solutions, often generated randomly, and then modify these solutions until a stopping criterion is satisfied.

To describe such a method, we need to specify:

(i) how we generate new solutions (ie, how we modify existing solutions), (ii) how we decide when to accept such a new solution, and

(iii) how we decide when to stop the search.

These three steps summarise the basic idea of an iterative search method. An example is a steepest descent method (a classical technique, following our classi-

(30)

fication). Assume we have a current solutionx^c, and want to find a new solution xⁿ. Then the rules are as follows:

(i) We estimate the slope (ie, the gradient) of f atx^c which gives us the search direction. The new solutionxⁿ is thenx^c−^α∇^f(x^c), whereαis a step size.

(ii) If f(xⁿ)< _f(x^c)then we acceptxⁿ,

But see Gill et al.

(1986, pp.100–102) for a discussion of sufficient decrease.

ie, we replace x^c byxⁿ.

(iii) We stop if no further improvements in f can be found, or if we reach a maximum number of function evaluations.

Problems will mostly occur with step (i) and (ii). Problems can occur with step (iii), too.

They are mostly caused by roundoff error, but can, at least for the applications in this thesis, be avoided by careful

programming; see Chambers (2008) for a discussion.

As was indicated already in Chapter1, there are models in which the gradient does not exist, or cannot be computed meaningfully (eg, when the objective function is not smooth). Hence we may need other approaches to compute a search direction. The acceptance- criterion for classical methods is strict: if there is no improvement, a candidate solution is not accepted. But if the objective function has several minima, this means we will never be able to move away from a local minimum, even if it is not the global optimum.

Heuristics follow the same basic pattern (i)–(iii), but they have different rules which are better suited for problems with noisy objective functions, multiple minima and other properties that may cause trouble for classical methods. In this sense, heuristics can be traced back to a class of numerical optimisation methods that were introduced back in the 1950s: direct search methods. This is not to say that there is a direct historical development from direct search to heuristic methods. Contributions to the development of modern heuristics came from different scientific disciplines. Yet, it is very instructive to study direct search methods: firstly, they are still widely applied, in particular the Nelder–Mead algorithm (inMatlabimplemented in the functionfminsearch, in R in the function optim). Secondly, and more importantly for this thesis, they share many of the characteristics of heuristic methods: they are simple, easy to implement, and simply work well for many problems.

We will define direct search methods as follows:

· The method uses only function evaluations to determine a search direction.

· The method does not model the objective function or its derivatives to derive a search direction or a step size.

This definition loosely follows Wright (1996). Trosset (1997) suggests to include that function values are only used in an ordinal sense (ie, ranking information);

the essence of this requirement is that we do not numerically approximate f or its derivatives. For more information on the history of direct search methods, see Kolda et al. (2003); Lewis et al. (2000); Wright (1996). Following this definition, all methods outlined later in this chapter could be regarded as direct search methods. Still, it makes sense to differentiate between these methods and heuristics, for typical direct search methods still have characteristics of clas-

(31)

sical methods that are not shared any more by heuristics. We will discuss these features by example.

the hooke–jeeves algorithm Hooke and Jeeves (1961) were the first to use the notion of direct search.

See the quote on

page vii. Their algorithm consists of the repeated application

of two phases: an exploratory move, and a pattern move. Assume our problem isd-dimensional, so a solution is a column vectorx of length d. The objective function is denoted f. We also define a matrix V of sized×^d, and denote the jth column of this matrix as vj. This matrix stores the search directions. To change a solution in an exploratory move, we add or subtract one column ofV, scaled by a step sizeh, tox. In the simplest case,Vis the identity matrix, hence adding±^hvj to xmeans to change the jth element ofx. This exploratory move is outlined in Algorithm2.1.

Algorithm2.1Exploratory move in Hooke–Jeeves algorithm.

1: seth 2: set x^c

3: fori=1 toddo

4: if f(x^c+hv_i)< _f(x^c) then xⁿ=x^c+hv_i 5: elseif f(x^c−^hvi)< _f(x^c) then xⁿ=x^c−^hvi

6: end for 7: return xⁿ

Thus we loop through our decision variables and change them slightly. If the new solution is better than the original one, we accept the change. Note that the ordering of the elements inx influences the resulting search direction. For high-dimensional problems, this search may become time-consuming.

An exploratory move, if successful, returns an improved solution xⁿ. A search direction is then computed asxⁿ−^x^c, and added to xⁿ; this is called the pattern move. Thus, the algorithm tries to exploit the search direction that lead to an improved solutionxⁿ. If no improvement forx^cis found, the step sizehis reduced. Algorithm2.2 details the whole procedure. The exploratory move is included as the functionN ^.

Algorithm2.2Hooke–Jeeves search.

1: seth 2: set x^c

3: whilestopping criteria not metdo 4: xⁿ =Nh(x^c)

5: if f(xⁿ)< _f(x^c)then 6: xⁿ_∗=xⁿ+ (xⁿ−^x^c)

7: if f(xⁿ_∗)< _f(xⁿ) then x^c=xⁿ_∗ else x^c=xⁿ 8: else

9: reduceh

10: end if 11: end while 12: x^sol=x^c

(32)

The stopping criterion can be defined as a minimal step size h (so if h is smaller than some threshold, the search is terminated), too small a change in f, or a maximum number of function evaluations. The remarkable property of this algorithm the choice of a search direction: instead of a locally-optimal direction (‘steepest’ descent), any search direction that lowers the objective function is accepted.

the nelder–mead simplex algorithm One of the shortcomings of the Hooke–

Jeeves pattern search is the large number of function evaluations required for the exploratory move. Spendley et al. (1962) suggested a new way to code a solution – as a simplex. (The Nelder–Mead simplex algorithm has nothing to do with George Dantzig’s simplex algorithm for solving linear programmes.) A simplex of dimensiond consists of d+1 vertices (points), hence for d = 1 we have a line segment; d = 2 is a triangle, d = 3 is a tetrahedron, and so on. In the algorithm of Spendley et al. (1962), this simplex could be reflected across an edge, or it could shrink. Thus, the size of the simplex could change, but never its form. Nelder and Mead (1965) added two more operations: now the simplex could also expand and contract; hence the simplex could change its size and its form.

Algorithm 2.3 outlines the Nelder–Mead algorithm. The notation follows Wright (1996); when solutions are ordered as

x1, x2, . . . ,x_d+1

this means we have

f(x1)< _f(x2)<_{. . .}< _f(x_d+1).

We denote the objective values associated with particular solutions as f1, f2, . . . , fd+1.

Typical values for the parameters are

ρ=1,χ=2, γ=¹/2, andσ=¹/2.

(These are also used in Matlab’s fminsearch.) Possible stopping criteria are computed from the f-values; the search may be stopped when |^f1− ^fd+1| ^is below a tolerance level, or when the simplex becomes ‘too small’. The number of function evaluations can (and should) be restricted, too.

The advantage here, compared with Hooke–Jeeves, is the reduced number of function evaluations: a reflection requires only one evaluation of the objective function; the other vertices of the simplex keep their old values.

2.2 heuristics

Both direct search algorithms share already one feature with modern heuristics:

they do not compute the search direction in a theoretically optimal way; rather, a