Book Chapter
Reference
A Forecasting Algorithm based on Information Theory: Working Paper
PINTADO, Paul-Xavier, FUENTES SANCHEZ, Edelmiro
Abstract
Time series forecasting plays an important role in financial activities since it allows investors to make better investment choices and reduce investment risk. Financial institutions forecasts rely essentially on algorithms based on statistics and probability theory. In this paper we explore an approach to forecasting based on information theory. Our approach is based on the intuition that a strong relationship exists between predictability and compressibility: if a string of data can be conveniently compressed it is because some kind of regularity or law has been detected that can be exploited to build a more compact encoding of the information contained in the string. Conversely, a data string generated at random cannot be further compressed because no regularity can be found on it. Information theory and Kolmogorov theory have been used as the foundation for the analysis and the development of compression algorithms. Interestingly, time series predictability also depends on ability to find recurring patterns on past data. The close relationship between compressibility and predictabiliiy has been recently addressed by [...]
PINTADO, Paul-Xavier, FUENTES SANCHEZ, Edelmiro. A Forecasting Algorithm based on Information Theory: Working Paper. In: Objects at large = Objets en liberté . Genève : Centre universitaire d'informatique, 1997. p. 209-216
Available at:
http://archive-ouverte.unige.ch/unige:155400
Disclaimer: layout of this document may differ from the published version.
1 / 1
A Forecasting Algorithm based on Information Theory
working paper Xavier Pintado Edelmiro Fuentes
Abstract
Time series forecasting plays an important role in financial activities since it allows in- vestors to make better investment choices and reduce investment risk. Financial institu- tions forecasts rely essentially on algorithms based on statistics and probability theory.
In this paper we explore an approach to forecasting based on information theory. Our approach is based on the intuition tba1 a strong relationship exists between predicrnbihty and compressibility: if a sLring of data cun be conveniently compressed it is because some kind of regularity or law has been detected that can be exploited to build a more compact encoding of the information contained in the string. Conversely, a data string generated at random cannot be further compressed because no regularity can be found on it. Information theory and Kolmogorov theory have been used as the foundation for the analysis and tbe development of compression algorithms. Interestingly, time series predictability also depends on abiJiry to find recurring patterns on past data. The close relat'ionship between compressibility and prediotabiliiy has been recently addressed by Feder and Gutman which describe an algorithm for the prediction of binary sequences that they demonstrated to be asymptotically optimal independently of the statistical dis- tribution of data. This is an interesting result that can be conveniently applied to time se- ries forecasting. This working paper describes the forecasting algorithm and discusses issues related to its adaptation for time series forecasting. We also provide some prelim- inary results.
1 Introduction
The time series forecasting problem can be stated as follows: given an arbitrary sequence of val- ues X
=
{x0x1,.'<2,x3, •.• , xJ arranged in non decreasing time order, find a sequence Y= {xk + 1,xk+ 2, xk +J• x.1;+ 4• ..• ,xk+nJ which represents future values for the lime series. Usu- ally X represents the sequence of known values (e.g., values observed in the past), and Y rep- resents Lhe sequence of forecasted vafues. [n lhis forecasting approach it is implicitly assumed that a relationship exists between past and future values. Otherwise stated, Y is not independent from X. So, time series forecasting consists essentially on the identification of the relationship between Y and X.The need for time series forecasting arises in many domains such as physics, astronomy, chemistry and finance. Time series forecasting is grounded on a solid and extensive theoretical background for which the initial driving force has been gambling during the eighteenth century.
A wealth of prediction algorithms have been developed based on theoretical results. Tn practice, the most popular algorithms are based on statis1ics a.ad probability theory. It should be noted that 209
210 A Fo,-ecasting Algorithm based on Information Theory
the choice of a specific algorithm depends on the time ~erie~ to he: rrnrlir.tierl For "'xample, alga·
rithms which perform some kind of spectral analysis (e.g., Fourrier analysis) are often used to predict time senes that display a cyclical behaviour. Conversely. trajectory-prediction algo- rithms such as those used for target tracking in warfare rely on the dynamic minimization of tracking error.
In the financial domain. achieving accurate forecasting usually means money, but achiev- ing consistent accuracy over time is difficult for two main reasons First, the behavior of finan- cial markets evolves because the actors in these markets are human beings which c(intinuously adapt their 111vt!Stment strategies. Consequently, it is difficult to identify market patrems i.vhich remain invariant over time. Second, markets tend to be efficient. An efficient market is one where prices are fair. That is, profit cannot be generated without risk (no arbitrage). Naturally, many market inefficiencies have been discovered in the past and many will be identified in the future, but such arbitrage gaps usually tend to close fairly fast.
Despite these difficulties financial institutions continue to invest huge amounts of money in the design and refinement of forecasting algorithms. Such investment is worthwhile, perhaps less in terms of prediction accuracy, but because it allows for a better understanding of the mech- anisms of market behaviour. The discipline of risk management that has been recently adopted in a large scale by financial institutions inherited many tools from previous work on time series prediction. This seems quite natural since risk management is strongly related to time series pre- diction. Financial risk relates to the uncertainty entailed by future investments. For instance, if perfect prediction accuracy could be achieved, investment risk would be null. Conversely, if fi- nancial markets displayed a perfectly random behaviour no predictions would be possible and risk management would not make much practical sense.
1.1 The relationship between predictability and data compression
Our interest on prediction started a few years ago when we began working on mathematical as- pects of risk management at the Financial lab of the Object Systems Group of the University of Geneva. Our first task was the implementation of prediction algorithms based on fourrier anal- ysis and the Box-Jenkins approach. Then we adopted a more dynamic approach with the imple- mentation of algorithms inspired by trajectory-tracking algorithms used in military applications for target tracking [I I]. By applying these algorithms to real market data provided by Reuters we gained insightful understanding of the practical advantages and shortcomings of these algo- rithms.
The observation of prediction algorithms at work showed us how an algorithm identifies a pattern an builds predictions based on the identification. In fact, algorithms attempt the identifi- cation of a law in the time series values to build predictions. From this experience we build a strong intuition that data compression and time series predictability should be strongly related.
This intuition was based on the fact that compression algorithms such as Huffman encoding or Ziv-Lempe! algorithms also derive, statically or incrementally, a law on the data to be com- pressed.
Since the development and analysis of compression algorithms is grounded essentially on information theory we thought that a similar approach could be adopted for time series forecast-
X Pintado and E. Fuentes 211 ing. We faced a difficulty though: while compression algorithms conven input encoding into more compact outpm encodi.ng, forecasting algoriLllmS take as input a sequence of values (that can also be seen as an encoding) and output values with a sinular encoding scheme as the input.
In 1994 a paper [I] authored by Feder and Gutman was presented with the Best Paper Award from the Information Theory Society of the IEEE. The authors present an interesting algorithm for the prediction of binary sequences. The main results are: (I) a proof that the algorithm is as- ymptotically optimal, and (2) the asymptotic optimality doe.~ not depend on the underlying sU\- tistical distribution of the sequence. The paper offers an interesting solution for the problem we were addressing and an elegant theoretical foundation for future developments.
The algorithm from Gutman and Feder attempts the dynamic construction of a finite au- tomaton which is used to assii,on probabilities to the symbols of the input alphabet. Based on these probabilities the algorithm emits a prediction for the next symbol. The authors prove that the number of states of the automaton is finite for sequences of infinite length which is an ap- pealing result in tenus of algorithm implementation. Their prediction algorithm shares many similarities with the Ziv-Lempe! algorithm for data compression [5][8] which is one of the most widely-used lossless compression algorithms. Gutman and Feder's approach shows that predic- tion and compression are to a large extent a similar problem that can be addressed with the same tools. Many authors address similar or related issues[2][4][6](7][9][10]. The next section dis- cusses the prediction algorithm.
2 Prediction based on Incremental Parsing 2.1 Incremental Parsing Algorithm
The Gutman-Feder incremental parsing algorithm parses a source sequence inro a set of distinct and gradually increasing length phrases. This set of phrases constitutes the dictionary.
During input scan the algorithm builds a phrase which is matched against phrases in the dictio- nary. This phrase is added to the dictionary if and only if it is the shortest string which is not a previously parsed phrase. As an example, assume that we have an alphabet formed by the sym- bols {O, I } . Consider the sequence 110 I 0 I OOO. It will be parsed into the dictionary { l, I 0, 1 0 I . 0, 00} . Figure 1 illustrates the parsing tree dictiona.ry generated by this sequence. As we can see, each phrase of the dictionary is represented by the path we follow
to
go rrom the root to each of the internal nodes of the tree. The leaves do nor represent any phrase but they show all tl1e possible new phrases that can be added 10 the dictionary. When one of this new phrases is added, the corresponding leaf is transformed automatically into an internal node by adding two leaves (one for each symbol of the alphabet).2.2
Probability assignment in the Gutman-Feder scheme
During the construction of the parsing tree, we are able to dinamically generate infon11ation that will let us define the conditional probability of the next incoming symbol for each intemal node. The mechanism is fairly simple. Each node of the tree (leaves, internal nodes and root) has an associated counter. For the leaves, this counter is set to one; the counter of an internal node (root included) is set to the number of its descendant leaves. To get a uniform probability mass
212
0
t
A Forecasting Algorithm based on Information Theory
"00''~
~o
~ Do~
"10"
"101 '
0 n
Q
root0
internal nodeD
leafFigure 1 Dictionary Tree
function. the counter associated with each node is divided by the counter of the rom. f'inal ly. the conditional probability of an incoming symbol given us past 1s ddined by the ratio between the probability of the node N1+1 and the probability of the node N1• These cu lcul::itinn~ are illustrated in Fig. 2.
2.3 Discussion
The parsing tree ls dynamically constn1cted during the scan oft.he input sequence. The way the shape of the tree evolves offers interesting info1T11ation about predictability. A well balanced tree (i.e .. almost complete) denote.~ input sequences which are random or close to random. In such a case, all outcomes have similar probability and predictability is low. Therefore, the entropy of the input sequence is close to its maximum. Compressibilicy of such a sequo::nce is low for the same reasons. An unbalanced tree denotes recurring patterns in the input sequence. Tn unbal- anced trees some paths have higher probability than others. For example. a sequence build from only one symbol, say {I, I, I, I, I. I, I, ... } generates a pmh wirh no branches. As expected, lhe outcome is highly predictable.
An interesting feature of Gulman-Feder's algorithm is that accuracy increases wilh the number of symbols parsed. The algorithm builds a parse tree which adapts to the structure of the input sequence. Jn some sense the algorithm learns the input sequence. It should be noted, how- ever, that the algorithm does not rely on feedback as typical learning algorithms do (e.g., neural
nef\~orks). The parse tree grows as input is scanned. The topology of the tree never changes to- wards the root. Only new nodes can be added.
3 Test framework
3.1 Application Architecture
We implemented a fTamework to test the algorithm with financial data tram stock exchanges.
Our goal is to assess the adequacy of the algorithm as a decision-making tool for investors. In particular we anticipate increased interest from financial institutions for this kjnd of algorithms
X Pintado and E. Fuentes
0
t
After "I"/
1/211y
1/3212 3/3
1~\/2
~ 112 113,i
2 , , / _3>1/3
After "10"
11/1 14
414 - - - 114
' 2/l 214
314" )/4 /
1~2
, 114 1/2~
!i4After "101" After "O"
I 15 1/5
/ 113 J/5
---
515 315 112 115
'
31~
---215 ---415, 4;5 / 213 --- 1/5
~115
112After "00" 112 --- 117 213 / 217 ---
/ 112 117
317
/
~117
717 113 !17
' 314 / 317 - - - 112
,~
/ ---- --- 117417 417 213 217
1 ~
117?-in
?l/6
216/ 216 ...
/ 112 ... 1!6
&6 I~ l~
' 314/ 316--- 1/2
416" 4/6 / ----213 216 ---- l '6
}~
-;;;-- 1•6116
Figure 2 Probability assignment in the Gutman-Feder scheme
213
because they allow for the definition of dynamic measures of investment risk. The framework is composed of two main modules (Fig. 3):
• Quantization: this module is responsible for the transformation of stock market time se- ries from their original form (a real number) into sequences of symbols. Quantization de- termines the input alphabet for the prediction engine and constructs symbol sequences.
• Prediction Engine: this module implements the Gutman-Feder parsing algorithm. It ac- cepts as input symbol sequences and generates as output a sequence of predicted sym- bols.
Our framework allows for flexible quantization. We implemented many different quantization methods to extract different kinds of patterns from a time series. It should be noted that the choice of a quantization method is critical for our prediction approach. In fact, the quantization
214 A Forecasting Algorithm based on In.formation Theo1y
method determines the informa tion we want to ex tract from the time series. By choosi ng appro- priate quantizatton methods we can focus. for example, on the predict ioi:i of short tcm1 and long
lt:1111 ti111c series mmds. A different quantiwtion method will allow us to predict a ume series
based on its correlation with another time series.
stock info current symbol Prediction predicted symbol
-
quantization --
Engine-
Figure 3 Framework's architecture
4 Preliminary results
The following table shows the forecasting algorithm applied to datly data from the Frat1kfurt stock exchange. Test data spans five years, from the beginning of 1992 to the end of 1996. The symbol 'O' is assigned to a price increase relative to the previous day. Conversely, the symbol
·I · is assigned 10 a price decrease. The table lists the stocks from the german DAX index. The second row shows how many predictions the algorithm has been able to output from a total of 806 samples. At euoh step the predictions are compared to the outcome. The U1i rd column shows the number of predictions tl1ar match the outcome. The last column shows lhe percentage of true predictions. It is important to notice that a prediction is not emitted when all the symbols in a particular node have lhe same conditional probability. This situation arises mainly when a new phrase is added to lhe dictionary.
Stock from DAX index Predicted TRUE %TRUE
ALLIANZ 714 360 50.42
BASF 713 347 48.67
BAYER 716 383 53.49
BHW 717 368 51.32
BMW 717 362 50.49
COMMEZBANK 714 348 48.74
CONTINENTAL 713 353 49.51
DAIMLER BENZ 714 334 46.78
DEGUSSA 710 368 51 .83
DEUTSCHE BABCOCK 714 367 51.40
BAY VEREINSBANK 712 353 49.58
DEUTSCHE BANK 712 389 54.63
X Pintado and E. Fuentes 215
DRESDNER BANK 712 379 53.23
HENKEL PR 715 354 49.51
HOECHST 714 361 50.56
KARSTADT 713 340 47.69
LUFTHANSA 717 361 50.35
LINDE 712 366 51.40
MAN 716 343 47.91
MANNES MAN 716 384 53.63
METALLGESELLSCHAFT 716 377 52.65
PREUSS AG 712 379 53.23
RWE 715 348 48.67
SCHERING 717 364 50.77
SIEMENS 709 352 49.95
THY SS EN 712 372 52.25
VEBA 717 385 53.70
VIAG 718 399 55.57
VOLKSWAGEN 715 394 55.10
As expected, predictions based on a single time series do not allow us to expect huge re- turns on investment although the algorithm compares quite favorably with traditional prediction algorithms. These results are computed over a long period oflime. Sliding-window analysis over shorter time intervals show local increases of prediction accuracy which can be exploited in a consistent way. We are currently implementing dynamic measures of correlation among time se- ries. These measures are used for the dynamic assessment of investment risk which will be the main focus of future work.
5 Conclusion and future work
This paper discusses an approach for time series forecasts based on information theory. We de- scribe the Gutman-Feder algorithm adapted for U1e prediction of financial time series. We also present some preliminary results. The test framework has been implemented in C++. It offers a flexible architecture for the transformat.ion of numerical time series into symbol sequences. Fu- ture work will focus essentially on dynamic measures of investment risk which play an impor- tant role in portfolio management.
216 A Forecasting Algorithm based on Information Theory
Bibliography
[I] M. Feder and M. Gutman. "Univer.;al Prediction of Individual Sequences." IEEE Transactions on Informa- tion Theory. vol. 38. no 4, pp. 1258-1270. July 1992.
[2] J. Rissanen. "A Umvc,,;al Dain Compl"Cl>sion System." IEEE Transactions on Information Theory. vol. IT- 29. no. 5. pp. 656-6(14. September 1983.
(3] G. G. Langdon Jr., "A Note on the Ziv-Lempe! Model for Compressing Individual Sequences," IEEE Trans- actions v11 hlformat1011 Thevry. vol. IT-29. no. 2, pp. 284-287. March 1983.
(4] M. Feder. "Gambling using a Finite State Machine."/£££ Transactions 011 lnformation Theory. vol. 37. no 5. pp. 1459-1465, September 1991.
(5] J. Ziv and A. Lempe!, "A Universal Algorithm for Sequential Data Compression." I£££ Transaction.' on In- formation Th<!ory. vol. IT-23, no. 3. pp. 337-343. May 1977.
(6] J. Rissanen, "Universal Coding. Information. Prediction, and Estimation,"/£££ Transactions on Information Theo1J'. vol. IT-30. no. 4. pp. 629-636, July 1984.
(7] J. Ziv. "Coding Theorems for Individual Sequences." IEEE Transactions on Information Theory, vol. IT-24, no. 4, pp. 405-412. July 1978.
[8] J. Ziv and A. Lempe!. "Compression oflndividual Sequences via Variable-Rate Coding." IEEE Transactions on Information Theory, vol. IT-24, no. 5, pp. 530-536. September I 978.
[9] A. Lempe] and J. Ziv, ··on the Complexity ofFin\te Sequences," IEEE Transactions on information Thevrv.
vol. !T-22. no. I, pp. 75-M I. January 1976.
(JO] J. S. Vitter and I'. Krishnan. "Optimal Prefetching via Data Compression." Journal of the ACM. vol. 43. no
s. pp. 771-793, Scptctnbcr 1996.
(11] K. C. C. Chan, V. Lee. and H. Leung, ''Radar Tracking for Air Surveillance in u strcssfuU Evironment Using a Fuzzy-Gain Filter",/£££ Transactions on Fuzzy Systems. vol. 5, no. I. pp. 80-119, February 1997.