A Story of Willful Optimism and Eventual Success

Michael Caplan¹ and Ying Becker²

1Principal, Head of Quantitative US Active Equity, State Street Global Advisors, One Lincoln Place, Boston, MA 02111 ;²Principal, Advanced Research Center, State Street Global Advisors, One Lincoln Place, Boston, MA 02111;

Abstract: This is a narrative describing the implementation of a genetic programming technique for stock picking in a quantitatively driven, risk-controlled, US equity portfolio. It describes, in general, the problems that the authors faced in their portfolio context when using genetic programming techniques and in gaining acceptance of the technique by a skeptical audience. We discuss in some detail the construction of the fitness function, the genetic programming system’s parameterization (including data selection and internal function choice), and the interpretation and modification of the generated programs for eventual implementation.

Key words: genetic programming, stock selection, data mining, fitness functions, quantitative portfolio management.

1. INTRODUCTION

This is the story of how the US Quantitative Equity Area and the Advanced Research Center of State Street Global Advisors (a unit of State Street Corporation) began using genetic programming techniques to discover new ways of stock investing in specific industries. The story begins with a poorly understood data mining technique, a discussion of previously developed underlying stock picking factors that we thought might make

sense, and a lot of disagreement on how to (if not whether to) implement a final stock-picking model. We describe our tribulations, technical and political, in defining the problem, codifying the solution, and finally convincing someone to invest using this model. Importantly, we describe how the genetic programming process improved our knowledge of how the stock market worked in a small, but portfolio performance-significant, industry.

This paper has the following broad sections:

The stock picking problems we faced, The financial elements that we had in place,

Why a direct solution really wasn’t possible and how we needed to construct (and adjust and adjust and adjust) our fitness function to proxy portfolio performance,

How we avoided/sidestepped data mining/snooping concerns, How we interpreted and modified our raw genetic programs, and The political battle to use the new model.

We promise that there are no special financial insights contained within this paper and the details of the final model are absolutely proprietary and are left purposefully vague but we think the story may be interesting to those trying to find new applications for genetic programming techniques.

1.1 The Stock Picking Problems We Faced (a.k.a. Our Growth Market Problem)

As quantitative portfolio managers at one of the largest institutional money managers in the world, our task is to build risk-controlled, market-beating stock portfolios using a composite model made of individual stock-picking factors. These stock-stock-picking factors fall into the following general classes: valuation (price-based), market sentiment, and business quality.

An inherent part of our portfolio management task is to build portfolios that work in a variety of market conditions and minimize the investors’ pain in periods where our stock picking isn’t strong. To this end, we do quite a bit of ex post analysis of our portfolio performance results and attempt to decompose our returns into elements of market risk and residual stock-picking performance as well as other more esoteric elements (volatility, market cap size, labor intensity, etc.). The net result of this analysis is a series of statistics that are suggestive of areas in which we do well and poorly. Often these statistics are quite time-period specific and require

additional insight (or intuition) that is generally well beyond the degrees of freedom permitted by the data.

One area that needed improvement was our performance in the high technology manufacturing industry. We tend to have very good average stock-picking performance in this industry over time but had dismal performance in periods where the stock market was in a speculative growth market mode. Given that the speculative growth markets had been of relatively short duration during much of 1980s and early 1990s, our composite model’s weakness in growth markets was masked by a preponderance of value markets in the prior 20 years. With a newly reinvigorated investor class gathering assets and market power (i.e., hedge funds and ultra-short term day-traders) as well as shorter-term client attitudes towards performance shortfalls, we needed to get our High Tech Manufacturing model into shape for both growth and value markets. For a relatively small industry of roughly 30 stocks and less than 4% of the market indices we typically benchmark our portfolios against, the performance impact of this industry (both positive and negative) was outsized and needed to be fixed.

Our traditional approach to solving this problem would be to go out and find a bunch of new (or old but unimplemented) factors that look like they might work in this area. This had already been attempted a few times and though we felt that we had sufficient elements to work with, we suspected that we hadn’t combined them optimally. Given the number of possible factors, the various degrees to which they are correlated, and the sheer number of possible interactions that we would need to investigate, we turned to the genetic programming technique that had been used to create fairly straightforward portfolio trading rules in State Street Global Advisors’

Currency Department.

2. PROJECT DESCRIPTION OVERVIEW

The flowchart shown in Figure 6-1 describes the development process for our project. Later sections of this paper describe in considerably more detail some of the decisions and compromises made in this project. We start with the upper left hand corner of the flowchart and begin with our set of presumptively useful factors (properly transformed for the project, we call these the alpha factors) pushed into the genetic programming system itself.

Using the output of the genetic programming system, we then translate the models into mathematical formulae and calculate various translations to decipher seemingly impenetrable equations. We then hit a decision node,

where we decide whether we have acceptable results for further testing or whether we need to make adjustments to our genetic programming process, in which case we loop again. Presuming we have acceptable equations to test, we would then compare these equations against our current factor combination.

Figure 6-1. Genetic Program Project Flowchart

2.1 Acceptability Criterion – Does the resulting model

agree with our intuition of how the markets work?

Dans le document Genetic Programming (Page 104-107)