single file

(1)

Proceedings of the First International Workshop on

ADVANCED ANALYTICS AND LEARNING ON TEMPORAL DATA

AALTD 2015

Workshop co-located with The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in

Databases (ECML PKDD 2015)

September 11, 2015. Porto, Portugal

Edited by

Ahlame Douzal-Chouakria Jos´ e A. Vilar

Pierre-Franc ¸ois Marteau Ann E. Maharaj

Andr´ es M. Alonso Edoardo Otranto

Irina Nicolae

CEUR Workshop Proceedings Volume 1425, 2015

(2)

(3)

Proceedings of AALTD 2015

First International Workshop on

“Advanced Analytics and Learning on Temporal Data”

Porto, Portugal

September 11, 2015

(4)

(5)

Volume Editors

Ahlame Douzal-Chouakria

LIG-AMA, Universit´e Joseph Fourier

Bâtiment Centre Equation 4, Allé de la Palestine à Gi`res UFR IM2AG, BP 53, F-38041 Grenoble Cedex 9, France E:mail:Ahlame.Douzal@imag.fr

Jos´e A. Vilar Fern´andez

MODES, Departamento de Matem´aticas, Universidade da Coru˜na

Facultade de Informática, Campus de Elviña, s/n, 15071 A Coruña, Spain E:mail:jose.vilarf@udc.es

Pierre-Fran¸cois Marteau

IRISA, ENSIBS, Universit´e de Bretagne Sud

Campus de Tohannic, BP 573, 56017 Vannes cedex, France E:mail:pierre-francois.marteau@univ-ubs.fr

Ann E. Maharaj

Department of Econometrics and Business Statistics, Monash University Caulfield Campus Building H, Level 5, Room 86

900 Dandenong Road, Caulfield East, Victoria 3145, Australia E:mail:ann.maharaj@monash.edu

Andr´es M. Alonso Fern´andez

Departamento de Estad´ıstica, Universidad Carlos III de Madrid C/ Madrid, 126, 28903 Getafe (Madrid) Spain

E:mail:andres.alonso@uc3m.es Edoardo Otranto

Department of Cognitive Sciences, Educational and Cultural Studies University of Messina

Via Concezione, n.6, 98121 Messina, Italy E:mail:eotranto@unime.it

Maria-Irina Nicolae

Jean Monnet University, Hubert Curien Lab

E105, 18 rue du Professeur Benoˆıt Lauras, Saint-Etienne, France E:mail:irina.nicolae@imag.fr

Copyright c⃝ 2015 Douzal, Vilar, Marteau, Maharaj, Alonso, Otranto, Nicolae Published by the editors on CEUR-WS.ORG

ISSN 1613-0073 Volume 1425 http://CEUR-WS.org/Vol-1425 This volume is published and copyrighted by its editors. The copyright for individual papers is held by the papers authors. Copying is permitted for private and academic purposes.

September, 2015

(6)

(7)

Preface

We are honored to welcome you to the 1st International Workshop on Advanced Analytics and Learning on Temporal Data (AALTD), which is held in Porto, Portugal, on September 11th, 2015, co-located with The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2015).

The aim of this workshop is to bring together researchers and experts in machine learning, data mining, pattern analysis and statistics to share their challenging issues and advance researches on temporal data analysis. Analysis and learning from temporal data cover a wide scope of tasks including learning metrics, learning representations, unsupervised feature extraction, clustering and classification.

This volume contains the conference program, an abstract of the invited keynote and the set of regular papers accepted to be presented at the conference. Each of the submitted papers was reviewed by at least two independent reviewers, leading to the selection of seventeen papers accepted for presentation and inclusion into the program and these proceedings. The contributions are given by the alphabetical order, by surname. An index of authors can be also found at the end of this book.

The keynote given by Gustavo Camps-Valls on “Capturing Time-Structures in Earth Observation Data with Gaussian Processes” focuses on machine learning models based on Gaussian processes which help to monitor land, oceans, and atmosphere through the estimation of climate and biophysical variables.

The accepted papers spanned from innovative ideas on analytic of temporal data, including promising new approaches and covering both practical and theoretical issues. Classification of time series, estimation of graphical models for temporal data, extraction of patterns from audio streams, searching causal models from longitudinal data and symbolic representation of time series are only a sample of the analyzed topics. To introduce the reader, a brief presentation of the problems addressed at each of papers is given below.

A novel approach to analyze the evolution of a disease incidence is presented by Andrade-Pachecoet al. The method is based on Gaussian processes and allows to study the effect of the time series components individually and hence to isolate the relevant components and explore short term variations of the disease. Baillyet al introduce a series classification procedure based on extracting local features using the Scale-Invariant Feature Transform (SIFT) framework and then building a global representation of the series using the Bag-of-Words (BoW) approach. Billardet al propose to highlight the main structure of multiple sets of multivariate time series by using principal component analysis where the standard correlation structure is replaced by lagged cross-autocorrelation.

The symbolic representation of time series SAXO is formalized as a hierarchical coclustering approach by Bondu et al, evaluating also its compactness in terms of coding length. A framework to learn an efficient temporal metric by combining

(8)

Marteau introduce a sparse version of Dynamic Time Warping (DTW), called coarse-DTW, and develop an efficient algorithm (Bubble) to sparse regular time series. By coupling both mechanisms, the nearest-neighbor classification of time series can be performed much faster.

Gallicchio et al study the balance assessment of elderly people with time series acquired with a Wii Balance Board. A novel technique to estimate the well-known Berg Balance Scale is proposed by using a Reservoir Computing network. Gibberd and Nelson address the estimation of graphical models when data change over time. Specifically, two extensions of the Gaussian graphical model (GGM) are introduced and empirically examined. Extraction of patterns from audio data streams is investigated by Hardyet al considering a symboliza- tion procedure combined with the use of different pattern mining methods. Jain and Spiegel propose a strategy to classify time series consisting of transforming the series into a dissimilarity representation and then applying PCA followed by an SVM. Krempl addresses the problem of forecasting the density at spatiotemporal coordinates in the future from a sample of pre-fixed instances observed at different positions in the feature space and at different times in the past. Two different approaches using spatio-temporal kernel density estimation are proposed. A fuzzyC-medoids algorithm to cluster time series based on comparing estimated quantile autocovariance functions is presented by Lafuente-Rego and Vilar.

A new algorithm for discovering causal models from longitudinal data is developed by Rahmadiet al. The method performs structure search over Struc- tural Equation Models (SEMs) by maximizing model scores in terms of data fit and complexity, showing robustness for finite samples. Salperwyck et al introduce a clustering technique for time series based on maximizing an inter-inertia criterion inside parallelized decision trees. An anomaly detection approach for temporal graph data based on an iterative tensor decomposition and masking procedure is presented by Sapienzaet al. Soheily-Khahet al perform an experimental comparison of several progressive and iterative methods for averaging time series under dynamic time warping. Finally, Sorokin extends the factored gated restricted Boltzmann machine model by adding discriminative component, thus enabling it to be used as a classifier and specifically to extract translational motion from two related images.

In sum, we think that all these contributions will provide valuable feedback and motivation to advance research on analysis and learning from temporal data.

It is planned that extended versions of the accepted papers will be published in a special volume of Lecture Notes of Artificial Intelligence (LNAI).

We wish to thank the ECML PKDD council members for giving us the op- portunity to hold the AALTD workshop within the framework of the ECML PKDD Conference and the members of the local organizing committee for their support. Also our gratitude to several colleagues that helped us with the organi-

(9)

zation of the workshop, particularly Saeid Soheily (Universit´e Grenoble Alpes, France).

The organizers of the AALTD conference gratefully thank the financial support of the “Programme d’Investissements d’Avenir” of the French government through the IKATS Project as well as the support received from LIG-AMA, IRISA, MODES, Universit´e Joseph Fourier and Universidade da Coru˜na.

Last but not least, we wish to thank the contributing authors for the high quality works and all members of the Reviewing Committee for their invaluable assistance in the selection process. All of them have significantly contributed to the success of AALTD 2105.

We sincerely hope that the workshop participants have a great and fruitful time at the conference.

September, 2015 Ahlame Douzal-Chouakria

Jos´e A. Vilar Pierre-Fran¸cois Marteau Ann E. Maharaj Andr´es M. Alonso Edoardo Otranto Irina Nicolae

(10)

Ahlame Douzal-Chouakria, Université Grenoble Alpes, France José Antonio Vilar Fernández, University of A Coruña, Spain Pierre-Fran¸cois Marteau, IRISA, Université de Bretagne-Sud, France Ann Maharaj, Monash University, Australia

Andr´es M. Alonso, Universidad Carlos III de Madrid, Spain Edoardo Otranto, University of Messina, Italy

Reviewing Committee

Massih-Reza Amini, Universit´e Grenoble Alpes, France Manuele Bicego, University of Verona, Italy

Gianluca Bontempi, MLG, ULB University, Belgium Antoine Cornu´ejols, LRI, AgroParisTech, France Pierpaolo D’Urso, University La Sapienza, Italy Patrick Gallinari, LIp. 43, UPMC, France Eric Gaussier, Universit´e Grenoble Alpes, France

Christian Hennig, Department of Statistical Science, London’s Global Univ, UK Frank H¨oeppner, Ostfalia University of Applied Sciences, Germany

Paul Honeine, ICD, Universit´e de Troyes, France Vincent Lemaire, Orange Lab, France

Manuel Garc´ıa Magariños, University of A Coruña, Spain Mohamed Nadif, LIPADE, Université Paris Descartes, France Fran¸cois Petitjean, Monash University, Australia

Fabrice Rossi, SAMM, Universit´e Paris 1, France Allan Tucker, Brunel University, UK

(11)

Conference programme schedule

Conference venue and some instructions

Within the framework of the ECML PKDD 2015 Conference, the AALTD Work- shop will take place from 15:00 to 18:00 on Friday, September 11, at the Alfˆandega Congress Centre, Rua Nova de Alfˆandega, 4050-430 Porto. The invited talk and the oral communications will take place at the roomPorto, on the second floor of the Congress Centre (see partial site plan below).

The lecture roomPortowill be equipped with a PC and a computer projector, which will be used for presentations. Before the session starts, presenters must provide to the session chair with the files for the presentation in PDF (Acrobat) or PPT (Powerpoint) format on a USB memory stick. Alternatively, the talks can be submitted by e-mail to Jos´e A. Vilar (jose.vilarf@udc.es) prior to the start of the conference. Time planned for each presentation is fifteen minutes with five additional minutes for discussion.

With regard to the poster session, the authors will be responsible for placing the posters in the poster panel, which should be carried out well in advance. The maximum size of the poster is A0.

(12)

Invited talk Chair: Ahlame Douzal 15:00 - 15:30 Capturing Time-Structures in Earth Observation Data with Gaussian

Processes

Gustavo Camps-Valls

Oral communication Chair: Jos´e A. Vilar

15:30 - 15:50 Time Series Classification in Dissimilarity Spaces Brijnesh J. Jain, Stephan Spiegel

Poster session Chairs: Maria-Irina Nicolae, Saeid Soheily 15:50 - 16:15 See table on next page.

16:00 - 16:15 COFFEE BREAK

Oral communications Chair: Pierre-Fran¸cois Marteau 16:15 - 16:35 Fuzzy Clustering of Series Using Quantile Autocovariances

Borja R. Lafuente-Rego, Jos´e A. Vilar 16:35 - 16:55 Temporal Density Extrapolation

Georg Krempl

16:55 - 17:15 Coarse-DTW: Exploiting Sparsity in Gesture Time Series Marc Dupont, Pierre-Fran¸cois Marteau

17:15 - 17:35 Symbolic Representation of Time Series: A Hierarchical Coclustering Formalization

Alexis Bondu, Marc Boull´e, Antoine Cornu´ejols

17:35 - 17:55 Monitoring Short Term Changes of Malaria Incidence in Uganda with Gaussian Processes

Ricardo Andrade Pacheco, Martin Mubangizi, John Quinn, Neil Lawrence

(13)

Communications in poster session

P1 Bag-of-Temporal-SIFT-Words for Time Series Classification

Adeline Bailly, Simon Malinowski, Romain Tavenard, Thomas Guyet, Lætitia Chapel

P2 An Exploratory Analysis of Multiple Multivariate Time Series Lynne Billard, Ahlame Douzal-Chouakria, Seyed Yaser Samadi P3 Temporal and Frequential Metric Learning for Time SerieskNN Clas-

sification

Cao-Tri Do, Ahlame Douzal-Chouakria, Sylvain Marie, Michele Rombaut

P4 Preliminary Experimental Analysis of Reservoir Computing Ap- proach for Balance Assessment

Claudio Gallicchio, Alessio Micheli, Luca Pedrelli, Federico Vozzi, Oberdan Parodi

P5 Estimating Dynamic Graphical Models from Multivariate Time-series Data

Alexander J. Gibberd, James D.B. Nelson P6 Sequential Pattern Mining on Multimedia Data

Corentin Hardy, Laurent Amsaleg, Guillaume Gravier, Simon Mali- nowski, Ren´e Quiniou

P7 Causality on Longitudinal Data: Stable Specification Search in Con- strained Structural Equation Modeling

Ridho Rahmadi, Perry Groot, Marianne Heins, Hans Knoop, Tom Heskes

P8 CourboSpark: Decision Tree for Time-series on Spark

Christophe Salperwyck, Simon Maby, Jérôme Cubillé, Matthieu La- gacherie

P9 Anomaly Detection in Temporal Graph Data: An Iterative Tensor Decomposition and Masking Approach

Anna Sapienza, Andr´e Panisson, Joseph Wu, Læaetitia Gauvin, Ciro Cattuto

P10 Progressive and Iterative Approaches for Time Series Averaging Saeid Soheily-Khah, Ahlame Douzal-Chouakria, Eric Gaussier P11 Classification Factored Gated Restricted Boltzmann Machine

Ivan Sorokin

(14)

(15)

Capturing Time-structures in Earth Observation Data with Gaussian Processes

Gustavo Camps-Valls

Department of Electrical Engineering, Universitat de Val`encia, Spain

Abstract. In this talk I will summarize our experience in the last years on developing algorithms in the interplay between Physics and Statisti- cal Inference to analyze Earth Observation satellite data. Some of them are currently adopted by ESA and EUMETSAT. I will pay attention to machine learning models that help to monitor land, oceans, and atmosphere through the estimation of climate and biophysical variables. In particular, I will focus on Gaussian Processes, which provide an adequate framework to design models with high prediction accuracy and able to cope with uncertainties, deal with heteroscedastic noise and particular time-structures, to encode physical knowledge about the problem, and to attain self-explanatory models. The theoretical developments will be guided by the challenging problems of estimating biophysical parameters at both local and global planetary scales.

Copyright c⃝2015 for this paper by its authors. Copying permitted for private and academic purposes.

(18)

(19)

Monitoring Short Term Changes of Malaria Incidence in Uganda with Gaussian Processes

Ricardo Andrade-Pacheco¹, Martin Mubangizi², John Quinn^2,3, and Neil Lawrence¹

1 University of Sheffield, Department of Computer Science, UK

2 Makerere University, College of Computing and Information Science, Uganda

3 UN Global Pulse, Pulse Lab Kampala, Uganda

Abstract. A method to monitor communicable diseases based on health records is proposed. The method is applied to health facility records of malaria incidence in Uganda. This disease represents a threat for approximately 3.3 billion people around the globe. We use Gaussian processes with vector-valued kernels to analyze time series components individually. This method allows not only removing the effect of specific components, but studying the components of interest with more detail. The short term variations of an infection are divided into four cyclical phases.

Under this novel approach, the evolution of a disease incidence can be easily analyzed and compared between different districts. The graphical tool provided can help quick response planning and resources allocation.

Keywords: Gaussian processes, malaria, kernel functions, time series.

1 Introduction

More than a century after discovering its transmission mechanism, malaria has been successfully eradicated from different regions of world [15]. However, it is still endemic in 100 countries and represents a threat for 3.3 billion people approximately [20]. In Uganda, malaria is among the leading causes of morbidity and mortality [19]. Different types of interventions can be carried on to prevent and treat malaria [20]. Their success depend on how well the disease can be anticipated and how fast the population reacts to it. In this regard, mathematical modelling can be a strong ally for decision-making and health services planning.

Spatiotemporal modelling for mapping and prediction of infection dynamics is a challenging problem. First of all, because of the costs and difficulties of gathering data. Second, because of the challenges of developing a sound theoretical model that agrees with the data observed.

The Health Management Information System (HMIS) operated by the Uganda Ministry of Health provides weekly records of the number of patients treated for malaria in different hospitals across the country. Unfortunately, the number of reporting hospitals is not consistent across time. This variation is prone to create artificial trends in the observed data. Hence, the underreporting effect has to be estimated to be removed.

A common approach for time series analysis is to decompose the observed variation into specific patterns such astrends,cyclic effects orirregular fluctua- tions[4, 3, 7]. Gaussian process (GP) models are a natural approach for analyzing

Copyright c2015 for this paper by its authors. Copying permitted for private and academic purposes.

(20)

functions that represent time series. GPs provide a robust framework for non- parametric probabilistic modelling [18]. The use of covariance kernels enable to analyse non-linear patterns by embedding an inference problem into an abstract space with aconvenient structure[14]. By combining different covariance kernels (via additions, multiplications or convolutions) into a single one, a GP is able to describe more complex functions. Each of the individual kernels contributes by encoding a specific set of properties or pattern of the resulting function [5].

We propose a monitoring system for communicable diseases based on Gaus- sian processes. This methodology is able to isolate the relevant components of the time series and study the short term variations of the disease. The output of this system is a graphical tool that discretizes the disease progress into four phases of simple interpretation.

2 Background

Say we are interested in learning the functional relation, between inputs and output, based on a set of observations {(xi, yi)}ⁿ_i=1. GP models introduce an additional latent variable f_x, whose covariance kernel K is a function of the input values. Usually, yi is considered a distorted version of the latent variable.

To deal with multiple outputs, GP models resort to generalizations of kernel functions to the vector-valued case [1]. In time series literature, vector-valued functions are commonly treated in the family of VAR models [12], while in geo- statistics literatureco-Kriging generalizations are used [8, 11]. These approaches are equivalent. Leth_x= (f_x¹, . . . , f_x^d)^⊤ be a vector-valued GP, its corresponding covariance matrix is given by

cov(h_x, h_z)ij

=

cov(f_xⁱ, f_z^j)

. (1)

The diagonal elements of the correlation matrix

cov(h_x, h_z)ii

are just the covariance functions of the real-valued GP elements. The non-diagonal elements represent the cross-covariance functionsbetween components [9, 10, 2].

3 Method Used

Suppose we have data generated from the combination of two independent signals (see Figure 1a). Usually, not only we are not able to observe the signals separately, but the combined signal they yield is corrupted by noise in the data collected (see Figure 1b). For the sake of this example, suppose that the two signals of the example represent a long term trend (the smooth signal) and a seasonal component (the sinusoidal signal). For an observer, the oscillations of the seasonal component masks the behaviour of the long term trend. At some point, however, the observer might want to know whether the trend is increasing or decreasing. Similarly, there might be interest in studying only the seasonal

(21)

Alarm System for Malaria component isolated from the trend. For example, in economics and finance, business recession and expansion periods are determined by studying the cyclic component of a set of indicators [16]. The cyclic component tells if an indicator is above or below the trend, and its differences tell if it is increasing or decreasing.

We propose a similar approach for monitoring disease incidence time series, but in our case, we will use a non-parametric approach. To extract the original signals, the observed data can be modelled using a GP with a combination of kernels, say exponentiated quadratics, one having a shorter lengthscale than the other. Figures 1c and 1d shows a model of the combined and independent signals.

We also use a vector-valued GP to model directly the derivative of the time series, rather than using simple differences of the observed trend. As a result, we are able to provide uncertainty estimates about the speed of the changes around the trend. Our approach is based on modelling linear functionals of an underlying GP [13]. If hx= (fx, ∂fx/∂xi)^⊤, its corresponding kernel is defined as

Γ(xi,xj) =

"

K(xi,xj) _∂x^∂

jK(xi,xj)

∂

∂xiK(xi,xj)_∂x^∂_i²_x_jK(xi,xj)

#

. (2)

In most multi-output problems, observations of the different outputs are needed to learn their relation. Here, the relation betweenf_xand its derivative is known beforehand through the derivative ofK. Thus∂f_x/∂xi can be learnt by relying entirely on f_x. For the signals described above, Figures 1e and 1f show the corresponding derivatives computed using a kernel of the form of (2). The derivatives of the long term trend are computed with high confidence, while the derivatives of the seasonal component have more uncertainty. The last is due to the magnitude of the seasonal component relative to the noise magnitude.

4 Uganda Case

In this exposition we focus on Kabarole district, but provide a snapshot of the monitoring system for all the country. Our base assumption about the infection process of malaria is that it evolves with some degree of smoothness across time. Smooth functions can be represented by a kernel such that the closer the observations in the input space, the more similar values of the output. The Mat´ern kernel family satisfies this condition, as it defines dependence through the distance between points with some exponential decay [18]. Different members of this family encode different degrees of smoothness, being the limit case the exponentiated quadratic kernel or RBF, which is infinitely differentiable. To illustrate our method we will use an RBF kernel. Results with (rougher) Mat´ern kernels do not differ much when used instead.

Despite malaria is a disease influenced by environmental factors like temperature or water availability, we could not observe a seasonal effect in HMIS data [6]. If that was the case, the model could be improved incorporating a periodic kernel in the covariance structure. Yet, the model fit can be improved if a second RBF kernel is added. In this case, one kernel has a short lengthscale and

(22)

(23)

Alarm System for Malaria which take time as input, the linear kernel takes the number of health facilities as input. In Table 1, we present a comparison of the model predictive performance, when using different kernels, based on the leave-one-out predictive probabilities [17]. The best predictive performance is achieved when considering short and long term changes and a correction for misreporting facilities.

Table 1: Comparison of LOO-CV log predictive probabilities, when using different kernels. The subindex ℓrefers to the lengthscale of the kernel (measured in years).

Kernel LOO-CV (log)

RBFℓ=0.64 -40.54

RBF_ℓ=0.14+RBF_ℓ=10 -16.26

RBFℓ=0.12+RBFℓ=10+Linear 41.21

Figure 2c shows the trend and short term component of the number of malaria cases. Variations of a disease incidence around its trend represent short term changes in the population health. Outbreak detection and control of non- endemic diseases take place in this time frame. For some endemic diseases, this variation can be associated to seasonal factors [6]. Quick response actions, such as distribution of medicine and allocation of patients to health centres, have to take place in this time regime to be effective. The short term variations can be classified in four phases as shown in Figure 2d (values are standardized). The upper left quadrant represents an incidence below the trend, but increasing; the upper right quadrant represents an incidence above the trend and expanding;

the bottom right quadrant represents an incidence above the trend, but decreasing; and the bottom left quadrant represents an incidence below the trend and decreasing.

This tracking system of short term variations is independent of the order of the disease counts, and can be used to monitor the infection progress in different districts. It is easy to identify districts where the disease is being controlled or where the infection is progressing at an unusual rate. Figure 2b shows the monitoring system on the whole country. Those districts where the variation coefficient of both the process and its derivative are less than 1 (meaning a weak signal vs noise) were left in gray color.

5 Final Remarks

We have proposed a disease monitor based on vector-valued Gaussian processes.

Our approach is able to account for uncertainty in both the level of each component and the direction of change. The simplicity for doing inference with this

(24)

(25)

Alarm System for Malaria

2. L. Baldassarre, L. Rosasco, A. Barla, and A. Verri. Multi-output learning via spectral filtering. Machine Learning, 87(3):259–301, 2012.

3. M. Baxter and R. G. King. Measuring business cycles: approximate band-pass filters for economic time series. Review of economics and statistics, 81(4):575–593, 1999.

4. W. P. Cleveland and G. C. Tiao. Decomposition of seasonal time series: A model for the census X-11 program. Journal of the American statistical Association, 71(355):581–587, 1976.

5. N. Durrande, J. Hensman, M. Rattray, and N. D. Lawrence. Gaussian process models for periodicity detection. arXiv preprint arXiv:1303.7090, 2013.

6. S. I. Hay, R. W. Snow, and D. J. Rogers. From predicting mosquito habitat to malaria seasons using remotely sensed data: practice, problems and perspectives.

Parasitology Today, 14(8):306–313, 1998.

7. A. Hyv¨arinen and E. Oja. Independent component analysis: algorithms and applications. Neural networks, 13(4):411–430, 2000.

8. G. Matheron. Pour une analyse krigeante de donnés régionalisées. Technical report, Ecole des Mines de Paris, Fontainebleau, France, 1982.´

9. C. A. Micchelli and M. Pontil. Kernels for multi-task learning. In Advances in Neural Information Processing Systems (NIPS). MIT Press, 2004.

10. C. A. Micchelli and M. Pontil. On learning vector–valued functions. Neural Com- putation, 17:177–204, 2005.

11. D. E. Myers. Matrix formulation of co-Kriging. Journal of the International As- sociation for Mathematical Geology, 14(3):249–257, 1982.

12. H. Quenouille.The analysis of multiple time-series. Griffin’s statistical monographs

& courses. Griffin, 1957.

13. S. S¨arkk¨a. Linear operators and stochastic partial differential equations in Gaussian process regression. In Artificial Neural Networks and Machine Learning–ICANN 2011, pages 151–158. Springer, 2011.

14. J. Shawe-Taylor and N. Cristianini. Kernel Methods for Pattern Analysis. Cam- bridge University Press, Cambridge, U.K., 2004.

15. P. I. Trigg and A. V. Kondrachine. Commentary: malaria control in the 1990s.

Bulletin of the World Health Organization, 76(1):11, 1998.

16. F. van Ruth, B. Schouten, and R. Wekker. The statistics Netherlands business cycle tracer. Methodological aspects; concept, cycle computation and indicator selection. Technical report, Statistics Netherlands, 2005.

17. A. Vehtari, V. Tolvanen, T. Mononen, and O. Winther. Bayesian leave-one-out cross-validation approximations for Gaussian latent variable models.arXiv preprint arXiv:1412.7461, 2014.

18. C. K. I. Williams and C. E. Rasmussen.Gaussian processes for Machine Learning.

MIT Press, 2006.

19. World Health Organization. World health statistics 2015. Technical report, WHO Press, Geneva, 2015.

20. World Health Organization and others. World malaria report 2014. Technical report, WHO Press, Geneva, 2014.

(26)

(27)

Bag-of-Temporal-SIFT-Words for Time Series Classification

Adeline Bailly¹, Simon Malinowski², Romain Tavenard¹, Thomas Guyet³, and Lætitia Chapel⁴

1 Universit´e de Rennes 2, IRISA, LETG-Rennes COSTEL, Rennes, France

2 Universit´e de Rennes 1, IRISA, Rennes, France

3 Agrocampus Ouest, IRISA, Rennes, France

4 Universit´e de Bretagne Sud, Vannes ; IRISA, Rennes, France

Abstract. Time series classification is an application of particular interest with the increase of data to monitor. Classical techniques for time series classification rely on point-to-point distances. Recently, Bag-of- Words approaches have been used in this context. Words are quantized versions of simple features extracted from sliding windows. The SIFT framework has proved efficient for image classification. In this paper, we design a time series classification scheme that builds on the SIFT framework adapted to time series to feed a Bag-of-Words. Experimental results show competitive performance with respect to classical techniques.

Keywords: time series classification, Bag-of-Words, SIFT, BoTSW

1 Introduction

Classification of time series has received an important amount of interest over the past years due to many real-life applications, such as environmental modeling, speech recognition. A wide range of algorithms have been proposed to solve this problem. One simple classifier is thek-nearest-neighbor (kNN), which is usually combined with Euclidean Distance (ED) or Dynamic Time Warping (DTW) [11]. Such techniques compute similarity between time series based on point-to-point comparisons, which is often not appropriate. Classification techniques based on higher level structures are most of the time faster, while being at least as accurate as DTW-based classifiers. Hence, various works have investigated the extraction of local and global features in time series. Among these works, the Bag-of-Words (BoW) approach (also called bag-of-features) has been considered for time series classification. BoW is a very common technique in text mining, information retrieval and content-based image retrieval because of its simplicity and performance. For these reasons, it has been adapted to time series data in some recent works [1, 2, 9, 12, 14]. Different kinds of features based on simple statistics have been used to create the words.

In the context of image retrieval and classification, scale-invariant descriptors have proved their efficiency. Particularly, the Scale-Invariant Feature Transform (SIFT) framework has led to widely used descriptors [10]. These descriptors are scale and rotation invariant while being robust to noise. We build on this framework to design a BoW approach for time series classification where the

Copyright c2015 for this paper by its authors. Copying permitted for private and academic purposes.

(28)

words correspond to the description of local gradients around keypoints, that are first extracted from the time series. This approach can be seen as an adaptation of the SIFT framework to time series.

This paper is organized as follows. Section 2 summarizes related work, Sec- tion 3 describes the proposed Bag-of-Temporal-SIFT-Words (BoTSW) method, and Section 4 reports experimental results. Finally, Section 5 concludes and discusses future work.

2 Related work

Our approach for time series classification builds on two well-known methods in computer vision: local features are extracted from time series using a SIFT- based approach and a global representation of time series is built using Bag- of-Words. This section first introduces state-of-the-art methods in time series classification, then presents standard approaches for extracting features in the image classification context and finally lists previous works that make use of such approaches for time series classification.

Data mining community has, for long, investigated the field of time series classification. Early works focus on the use of dedicated metrics to assess similarity between time series. In [11], Ratanamahatana and Keogh compare Dy- namic Time Warping to Euclidean Distance when used with a simplekNN classifier. While the former benefits from its robustness to temporal distortions to achieve high efficiency, ED is known to have much lower computational cost.

Cuturi [4] shows that DTW fails at precisely quantifying dissimilarity between non-matching sequences. He introduces Global Alignment Kernel that takes into account all possible alignments to produce a reliable dissimilarity metric to be used with kernel methods such as Support Vector Machines (SVM). Douzal and Amblard [5] investigate the use of time series metrics for classification trees.

So as to efficiently classify images, those first have to be described accurately.

Both local and global descriptions have been proposed by the computer vision community. For long, the most powerful local feature for images was SIFT [10]

that describes detected keypoints in the image using the gradients in the regions surrounding those points. Building on this, Sivic and Zisserman [13] suggested to compare video frames using standard text mining approaches in which docu- ments are represented by word histograms, known as Bag-of-Words (BoW). To do so, authors map the 128-dimensional space of SIFT features to a codebook of few thousand words using vector quantization. VLAD (Vector of Locally Ag- gregated Descriptors) [6] are global features that build upon local ones in the same spirit as BoW. Instead of storing counts for each word in the dictionary, VLAD preserves residuals to build a fine-grain global representation.

Inspired by text mining, information retrieval and computer vision commu- nities, recent works have investigated the use of Bag-of-Words for time series classification [1, 2, 9, 12, 14]. These works are based on two main operations: con- verting time series into Bag-of-Words (a histogram representing the occurrence of words), and building a classifier upon this BoW representation. Usually, clas-

(29)

Bag-of-Temporal-SIFT-Words for Time Series Classification

sical techniques are used for the classification step: random forests, SVM, neural networks, kNN. In the following, we focus on explaining how the conversion of time series into BoW is performed in the literature. In [2], local features such as mean, variance, extremum values are computed on sliding windows. These features are then quantized into words using a codebook learned by a class proba- bility estimate distribution. In [14], discrete wavelet coefficients are extracted on sliding windows and then quantized into words usingk-means. In [9, 12], words are constructed using the SAX representation [8] of time series. SAX symbols are extracted from time series and histograms of n-grams of these symbols are computed. In [1], multivariate time series are transformed into a feature matrix, whose rows are feature vectors containing a time index, the values and the gradi- ent of time series at this time index (on all dimensions). Random samples of this matrix are given to decision trees whose leaves are seen as words. A histogram of words is output when the different trees are learned. Rather than computing features on sliding windows, authors of [15] first extract keypoints from time series. These keypoints are selected using the Differences-of-Gaussians (DoG) framework, well-known in the image community, that can be adapted to one- dimensional signals. Keypoints are then described by scale-invariant features that describe the shapes of the extremum surrounding keypoints. In [3], extraction and description of time series keypoints in a SIFT-like framework is used to reduce the complexity of Dynamic Time Warping: features are used to match anchor points from two different time series and prune the search space when finding the optimal path in the DTW computation.

In this paper, we design a time series classification technique based on the extraction and the description of keypoints using a SIFT framework adapted to time series. The description of keypoints is quantized using ak-means algorithm to create a codebook of words and classification of time series is performed with a linear SVM fed with normalized histograms of words.

3 Bag-of-Temporal-SIFT-Words (BoTSW) method

The proposed method is adapted from the SIFT framework [10] widely used for image classification. It is based on three main steps : (i) detection of keypoints (scale-space extrema) in time series, (ii) description of these keypoints by gra- dient magnitude at a specific scale, and (iii) representation of time series by a BoW, words corresponding to quantized version of the description of keypoints.

These steps are depicted in Fig. 1 and detailed below.

Following the SIFT framework, keypoints in time series correspond to local extrema both in terms of scale and location. These scale-space extrema are identified using a DoG function, which establishes a list of scale-invariant keypoints.

LetL(t, σ) be the convolution (∗) of a Gaussian functionG(t, σ) of widthσwith a time seriesS(t):

L(t, σ) =G(t, σ)∗S(t).

DoG is obtained by subtracting two time series filtered at consecutive scales:

D(t, σ) =L(t, kscσ)−L(t, σ),

(30)

(31)

Dataset

BoTSW + linear SVM

BoTSW + 1NN

ED + 1NN

DTW + 1NN

k n_b ER k n_b ER ER ER

50words 512 16 0.363 1024 16 0.400 0.369 0.310

Adiac 512 16 0.614 128 16 0.642 0.389 0.396

Beef 128 10 0.400 128 16 0.300 0.467 0.500

CBF 64 6 0.058 64 14 0.049 0.148 0.003

Coffee 256 4 0.000 64 12 0.000 0.250 0.179

ECG200 256 16 0.110 64 12 0.160 0.120 0.230

Face (all) 1024 8 0.218 512 16 0.239 0.286 0.192 Face (four) 128 12 0.000 128 6 0.046 0.216 0.170

Fish 512 16 0.069 512 14 0.149 0.217 0.167

Gun-Point 256 4 0.080 256 10 0.067 0.087 0.093

Lightning-2 16 16 0.361 512 16 0.410 0.246 0.131 Lightning-7 512 14 0.384 512 14 0.480 0.425 0.274

Olive Oil 256 4 0.100 512 2 0.100 0.133 0.133

OSU Leaf 1024 10 0.182 1024 16 0.248 0.483 0.409 Swedish Leaf 1024 16 0.152 512 10 0.229 0.213 0.210 Synthetic Control 512 14 0.043 64 8 0.093 0.120 0.007

Trace 128 10 0.010 64 12 0.000 0.240 0.000

Two Patterns 1024 16 0.002 1024 16 0.009 0.090 0.000

Wafer 512 12 0.001 512 12 0.001 0.005 0.020

Yoga 1024 16 0.150 512 6 0.230 0.170 0.164 Table 1: Classification error rates (best performance is written as bold text).

frequency vector) of word occurrences. These histograms are then passed to a classifier to learn how to discriminate classes from this BoTSW description.

4 Experiments and results

In this section, we investigate the impact of both the number of blocksn_band the number of wordskin the codebook (defined in Section 3) on classification error rates. Experiments are conducted on 20 datasets from the UCR repository [7].

We set all parameters of BoTSW but n_b andkas follows : σ= 1.6,k_sc = 2^1/3, a = 8. These values have shown to produce stable results. Parameters n_b and kvary inside the following sets :{2,4,6,8,10,12,14,16}and

2ⁱ,∀i∈ {2..10}

respectively. Codebooks are obtained via k-means quantization. Two classifiers are used to classify times series represented as BoTSW : a linear SVM or a 1NN classifier. Each dataset is composed of a train and a test set. For our approach, the best set of (k, nb) parameters is selected by performing a leave-one-out cross- validation on the train set. This best set of parameters is then used to build the classifier on the train set and evaluate it on the test set. Experimental error rates (ER) are reported in Table 1, together with baseline scores publicly available at [7].

(32)

0 0.1 0.2 0.3 0.4 0.5

4 16 64 256 1024

Error rate

Number of codewords nb = 4

n_b = 10 nb = 16

0 0.1 0.2 0.3 0.4 0.5

2 4 6 8 10 12 14 16

Error rate

Number of SIFT bins k = 64

k = 256 k = 1024

Fig. 2: Classification accuracy on dataset Yoga as a function ofkandn_b.

ED+

1NN

DTW+

1NN TSBF[2] SAX-

VSM[12] SMTS[1] BoP[9]

W T L W T L W T L W T L W T L W T L

BoTSW+lin. SVM 18 0 2 11 0 9 8 0 12 9 2 9 7 0 13 14 0 6 BoTSW + 1NN 13 0 7 9 1 10 5 0 15 4 3 13 4 1 15 7 1 12 Table 2: Win-Tie-Lose (WTL) scores comparing BoTSW to state-of-the-art methods. For instance, BoTSW+linear SVM reaches better performance than ED+1NN on 18 datasets, and worse performance on 2 datasets.

BoTSW coupled with a linear SVM is better than both ED and DTW on 11 datasets. It is also better than BoTSW coupled with a 1NN classifier on 13 datasets. We also compared our approach with classical techniques for time series classification. We varied number of codewordskbetween 4 and 1024. Not surprisingly, cross-validation tends to select large codebooks that lead to more precise representation of time series by BoTSW. Fig. 2 shows undoubtedly that, for Yoga dataset, (left) the larger the codebook, the better the results and (right) the choice of the number n_b of blocks is less crucial as a wide range of values yield competitive classification performance.

Win-Tie-Lose scores (see Table 2) show that coupling BoTSW with a linear SVM reaches competitive performance with respect to the literature.

As it can be seen in Table 1, BoTSW is (by far) less efficient than both ED and DTW for dataset Adiac. As BoW representation maps keypoint descriptions into words, details are lost during this quantization step. Knowing that only very few keypoints are detected for these Adiac time series, we believe a more precise representation would help.

5 Conclusion

BoTSW transforms time series into histograms of quantized local features. Dis- tinctiveness of the SIFT keypoints used with Bag-of-Words enables to efficiently and accurately classify time series, despite the fact that BoW representation

(33)

ignores temporal order. We believe classification performance could be further improved by taking time information into account and/or reducing the impact of quantization losses in our representation.

Acknowledgments

This work has been partly funded by ANR project ASTERIX (ANR-13-JS02- 0005-01), R´egion Bretagne and CNES-TOSCA project VEGIDAR.

References

1. M. G. Baydogan and G. Runger. Learning a symbolic representation for multivariate time series classification. DMKD, 29(2):400–422, 2015.

2. M. G. Baydogan, G. Runger, and E. Tuv. A Bag-of-Features Framework to Classify Time Series. IEEE PAMI, 35(11):2796–2802, 2013.

3. K. S. Candan, R. Rossini, and M. L. Sapino. sDTW: Computing DTW Distances using Locally Relevant Constraints based on Salient Feature Alignments. Proc.

VLDB, 5(11):1519–1530, 2012.

4. M. Cuturi. Fast global alignment kernels. InProc. ICML, pages 929–936, 2011.

5. A. Douzal-Chouakria and C. Amblard. Classification trees for time series.Elsevier Pattern Recognition, 45(3):1076–1091, 2012.

6. H. J´egou, M. Douze, C. Schmid, and P. P´erez. Aggregating local descriptors into a compact image representation. InProc. CVPR, pages 3304–3311, 2010.

7. E. Keogh, Q. Zhu, B. Hu, Y. Hao, X. Xi, L. Wei, and C. A. Ratanama- hatana. The UCR Time Series Classification/Clustering Homepage, 2011.

www.cs.ucr.edu/~eamonn/time_series_data/.

8. J. Lin, E. Keogh, S. Lonardi, and B. Chiu. A symbolic representation of time series, with implications for streaming algorithms. InProc. ACM SIGMOD Workshop on Research Issues in DMKD, pages 2–11, 2003.

9. J. Lin, R. Khade, and Y. Li. Rotation-invariant similarity in time series using bag-of-patterns representation. IJIS, 39:287–315, 2012.

10. D. G. Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 60(2):91–110, 2004.

11. C. A. Ratanamahatana and E. Keogh. Everything you know about dynamic time warping is wrong. In Proc. ACM SIGKDD Workshop on Mining Temporal and Sequential Data, pages 22–25, 2004.

12. P. Senin and S. Malinchik. SAX-VSM: Interpretable Time Series Classification Using SAX and Vector Space Model. Proc. ICDM, pages 1175–1180, 2013.

13. J. Sivic and A. Zisserman. Video Google: A text retrieval approach to object matching in videos. InProc. ICCV, pages 1470–1477, 2003.

14. J. Wang, P. Liu, M. F.H. She, S. Nahavandi, and A. Kouzani. Bag-of-words Rep- resentation for Biomedical Time Series Classification. BSPC, 8(6):634–644, 2013.

15. J. Xie and M. Beigi. A Scale-Invariant Local Descriptor for Event Recognition in 1D Sensor Signals. InProc. ICME, pages 1226–1229, 2009.

(34)

(35)

An Exploratory Analysis of Multiple Multivariate Time Series

Lynne Billard¹, Ahlame Douzal-Chouakria², and Seyed Yaser Samadi³

1 Department of Statistics, University of Georgia

2 Universit´e Grenoble Alpes, CNRS - LIG/AMA, France

3 Department of Mathematics, Southern Illinois University

Abstract. Our aim is to extend standard principal component analysis for non-time series data to explore and highlight the main structure of multiple sets of multivariate time series. To this end, standard variance- covariance matrices are generalized to lagged cross-autocorrelation matrices. The methodology produces principal component time series, which can be analysed in the usual way on a principal component plot, except that the plot also includes time as an additional dimension.

1 Introduction

Time series data are ubiquitous, arising throughout economics, meteorology, medicine, the basic sciences, even in some genetic microarrays, to name a few of the myriad fields of application. Multivariate time series are likewise prevalent.

Our aim is to use principal components methods as an exploratory technique to find clusters of time series in a set of S multivariate time series. For example, in a collection of stock market time series, interest may center on whether some stocks, such as mining stocks, behave alike but differently from other stocks, such as pharmaceutical stocks.

A seminal paper in univariate time series clustering is that of Koˇsmelj and Batagelj (1990), based on a dissimilarity measure. Since then several researchers have proposed other approaches (e.g. Caiado et al (2015), D’Urso and Maharaj (2009)). A comprehensive summary of clustering for univariate time series is in Liao (2005). Liao (2007) introduced a two-step procedure for multivariate series which transformed the observations into a single multivariate series. Most of these methods use dissimilarity functions or variations thereof. A summary of Liao (2005, 2007) along with more recent proposed methods is in Billard et al.

(2015). Though a few authors specify a particular model structure, by and large, the dependence information inherent to time series observations is not used.

Dependencies in time series are measured through the autocorrelation (or, equivalently, the autocovariance) functions. In this work, we illustrate how these

Copyright c⃝2015 for this paper by its authors. Copying permitted for private and academic purposes.

(36)

dependencies can be used in a principal component analysis. This produces principal component time series, which in turn allows the projection of the original time series observations onto three dimensional principal component by time space. The basic methodology is outlined in Section2, and illustrated in Section 3.

2 Methodology

2.1 Cross-Autocorrelation functions for S > 1 series and p > 1 dimensions

LetXst={(Xstj), j= 1, . . . , p},t= 1, . . . , Ns,s= 1, . . . , S, be ap-dimensional time series of length Ns, for each series s. For notational simplicity, assume Ns=N for alls. Let us also assume the observations have been suitably differ- enced/transformed so that the data are stationary.

For a standard single univariate series time series whereS= 1 andp= 1, it is well-known that the sample autocovariance function at lagkis (dropping the s=S= 1 and j=p= 1 subscripts)

ˆ

γ(k) = 1 N

N∑−k

t=1

(Xt−X¯)(Xt+k−X),¯ k= 0,1, . . . , X¯ = 1 N

∑N

t=1

Xt, (2.1)

and the sample autocorrelation function at lag k is ˆρ(k) = ˆγ(k)/ˆγ(0), k = 0,1, . . ..

These autocorrelation functions provide a measure of the time dependence between observations changes as their distance apart, lag k. They are used to identify the type of model and also to estimate model parameters. See, many of the basic texts on time series, e.g., Box et al. (2011); Brockwell and Davis (1991); Cryer and Chan (2008). Note that the divisor in Eq.(2.1) isN, rather thanN−k. This ensures that the sample autocovariance matrix is non-negative definite.

For a single multivariate time series where S = 1 and p ≥ 1, the cross- autocovariance function between variables (j, j^′) at lag k is the p×p matrix Γ(k) with elements estimated by

ˆ

γ_jj′(k) = 1 T

T∑−k

t=1

(X_tj−X¯_j)(X_t+k,j′−X¯_j′), k= 0,1,with ¯X_j = 1 N

∑N

t=1

X_tj, (2.2)

(37)

An Exploratory Analysis of Multiple Multivariate Time Series and the cross-autocorrelation function between variables (j, j^′) at lag k is the p×pmatrix,ρ(k), with elements{ρjj^′(k), j, j^′= 1, . . . , p} estimated by

ˆ

ρjj^′(k) = ˆγjj^′(k)/{γˆjj(0)ˆγj^′j^′(0)}^1/2, k= 0,1, . . . . (2.3) Unlike the autocorrelation function obtained from Eq.(2.1) with its single value at each lagk, Eq.(2.3) produces ap×pmatrix at each lagk. The function Eq.(2.2) was first given by Whittle (1963) and shown to be nonsymmetric by Jones (1964). In general,ρjj′(k)̸=ρj′j(k) for variablesj ̸=j^′, except fork= 0, but ρ(k) =ρ^′(−k); see, e.g., Brockwell and Davis (1991).

When there areS≥1 series andp≥1 variables, the definition of Eqs.(2.2)- (2.3) can be extended to give ap×psample cross-autocovariance function matrix between variables (j, j^′) at lag k, Γˆ(k), with elements given by, for j, j^′ = 1, . . . , p,

ˆ

γjj^′(k) = 1 N S

∑S

s=1 N∑−k

t=1

(Xstj−X¯j)(Xs,t+k,j^′ −X¯j^′), k= 0,1, (2.4)

with ¯Xj = 1 N S

∑S

s=1

∑N

t=1

Xstj;

and the p×p sample cross-autocorrelation matrix at lag k, ˆρ⁽¹⁾(k), has elements ˆρ_jj′(k),j, j^′ = 1, . . . , p, obtained by substituting Eq.(2.4) into Eq.(2.3).

This cross-autocovariance function in Eq.(2.4) is a measure of time dependence between observations k units apart for a given variable pair (j, j^′), calculated across allS series. Notice, the sample means ¯Xj in Eq.(2.4) are calculated across allN S observations.

An alternative approach is to calculate these sample means by series. In this case, the cross-autocovariance matrix Γˆ(k) has elements estimated by, for j, j^′= 1, . . . , p, s= 1, . . . , S,

ˆ

γ_jj′(k) = 1 N S

∑S

s=1 N∑−k

t=1

(X_stj−X¯_sj)(X_s,t+k,j−X¯_sj′), k= 0,1, (2.5)

with ¯X_sj= 1 N

∑N

t=1

X_stj;

and the corresponding p×p cross-autocorrelation function matrix ˆρ⁽²⁾(k) has elements ˆρjj′(k) found by substituting Eq.(2.5) into Eq.(2.3).

Other model structures can be considered, which would provide other options for obtaining the relevant sample means. These include class structures, lag k structures, weighted series and/or weighted variable structures, and the like; see Billard et al. (2015).

(38)

2.2 Principal Components for Time Series

In a standard classical principal component analysis on a set of p-dimensional multivariate observations X={Xij, i= 1, . . . n, j= 1, . . . , p}, each observation is projected into a correspondingν^thorder principal component,P C_ν(i), through the linear combination of the observation’s variables,

P Cν(i) =wν1Xi1+· · ·+wνpXip, ν= 1, . . . , p, (2.6) wherew_ν = (w_ν1, . . . , w_νp) is theν^theigenvector of the correlation matrixρ(or, equivalently for non-standardized observations, the variance-covariance matrix Σ). The eigenvalues satisfyλ₁≥λ₂≥. . .≥λ_p≥0, and∑p

ν=1λ_ν=p(or,σ²for non-standardized data). A detailed description of this methodology for standard data can be found in any of the numerous texts on multivariate analysis, e.g., Joliffe (1986) and Johnson and Wichern (2007) for an applied approach, and Anderson (1984) for theoretical details.

For time series data, the correlation matrix ρ is replaced by the cross- autocorrelation matrix ρ(k), for a specific lag k = 1,2, . . ., and the ν^th order principal component of Eq.(2.6) becomes

P Cν(s, t) =wν1Xs1t+· · ·+wνpXspt, ν = 1, . . . , p, t= 1, . . . , N, s= 1, . . . , S.

(2.7) The elements ofρ(k) can be estimated by ˆρjj^′(k) from Eq.(2.4) or from Eq.(2.5) (or from other choices of model structure). The problem of non-positive defi- niteness, for lagk >0, for the cross-autocorrelation matrix has been studied by Rousseeuw and Molenberghs (1993) and J¨ackel (2002), with the recommendation that negative eigenvalues be re-set at zero.

3 Illustration

To illustrate, take a data set (<http://dss.ucar.edu/datasets/ds578.5>) where the observations are time series of monthly temperatures at S = 14 cities (weather stations) in China over the years 1923-88. In the present analysis, each month is taken to be a single variable corresponding to the twelve months (January, . . ., December, respectively); hence, p = 12. Clearly, these variables are dependent as reflected in the cross-autocovariances whenj̸=j^′.

Let us limit the discussion to using the cross-autocorrelation functions at lag k= 1, evaluated from Eq.(2.4) and Eq.(2.3), and shown in Table 1. We obtain the corresponding eigenvalues and eigenvectors, and hence we calculate the principal components P Cν, ν = 1, . . . , p, from Eq.(2.6). A plot of P C1×P C2×time is

(39)

An Exploratory Analysis of Multiple Multivariate Time Series displayed in Figure 1, and that for P C₁×P C₃×time is given in Figure 2. An interesting feature of these data highlighted by the methodology is that it is theP C₁×P C₃ pair that distinguishes more readily the city groupings. Figure 3 displays the P C1×P C3 values for all series and all times without tracking time (i.e., the 3-dimensional P C₁×P C₃×time values are projected onto the P C1×P C3 plane). Hence, we are able to discriminate between cities.

Thus, we observe that cities 1-4 (Hailaer, HaErBin, MuDanJiang and ChangChun, respectively), color coded in black (and indicated by the symbol black◦and full lines (‘lty=1’)) have similar temperatures and are located in the north-eastern region of China. Cities 5-7 (TaiYuan, BeiJing, TianJin), identified by red (△and lines − · −(‘lty=4’)), are in the north, and have similar but different temperature trends than do those in the north-eastern region. Two (BeiJing and TianJin) are located close to sea-level, while the third (TaiYuan) is further south (and so might be expected to have higher temperatures) but its elevation is very high so decreasing its temperature patterns to be more in line with BeiJing and TianJin.

Cities 8-11 (ChengDu, WuHan, ChangSha, HangZhou), green (∗) with lines· · · (‘lty=3’), are located in central regions with ChengDu further west but elevated.

Finally, cities 12-14 (FuZhou, XiaMen, GuangZhou), blue () with lines− − − (‘lty=8’), are in the southeast part of the country.

Pearson correlations between the variablesXj, j = 1, . . . ,12, and the principal componentsP C_ν,ν= 1, . . . ,12, sand correlation circles (not shown) show that all months have an impact on P C1with the months of June, July and Au- gust having a slightly negative influence on P C₂. Plots for otherk ̸= 1 values give comparable results. Likewise, analyses using the cross-autocorrelations of Eq.(2.5) also produce similar conclusions.

4 Conclusion

The methodology has successfully identified cities with similar temperature trends, which trendsa prioricould not have been foreshadowed, but which do conform with other geophysical information thus confirming the usefulness of the methodology. The cross-autocorrelation functions for ap-dimensional multivariate time series have been extended to the case where there are S ≥1 multivariate time series. These replaced the standard variance-covariance matrices for use in a principal component analysis, thus retaining measures of the time dependencies inherent to time series data. The methodology produces principal component time series, which can be compared in the usual way on a principal component plot, except that the plot also includes time as an additional plot dimension.

(40)

References

Anderson, T.W. (1984):An Introduction to Multivariate Statistical Analysis (2nd ed), John Wiley, New York.

Billard, L., Douzal-Chouakria, D. and Samadi, S. Y. (2015). Toward Autocorrela- tion Functions: A Non-Parametric Approach to Exploratory Analysis of Multiple Multivariate Time Series. Manuscript.

Box, G. E. P., Jenkins, G. M. and Reinsel, G. C. (2011): Time Series Analysis:

Forecasting and Control (4th. ed.). John Wiley, New York.

Brockwell, P.J. and Davis, R.A. (1991):Time Series: Theory and Methods.Springer- Verlag, New York.

Caiado, J., Maharaj, E. A., D’Urso, P. Time series clustering, in Handbook of Clus- ter Analysis, Chapman & Hall, C. Hennig, M. Meila, F. Murtagh, R. Rocci (eds.), in press.

Cryer, J.D. and Chan, K.-S. (2008):Time Series Analysis. Springer-Verlag, New York.

D’Urso, P., Maharaj, E. A. (2009) Autocorrelation-based Fuzzy Clustering of Time Series, Fuzzy Sets and Systems, 160, 35653589. DOI: 10.1016/j.fss.2009.04.013.

J¨ackel, P. (2002):Monte Carlo Methods in Finance. John Wiley, New York.

Johnson, R.A. and Wichern, D.W. (2007):Applied Multivariate Statistical Analysis (7th ed.), Prentice Hall, New Jersey.

Joliffe, I.T. (1986):Principal Component Analysis, Springer-Verlag, New York.

Jones, R.H. (1964): Prediction of multivariate time series.Journal of Applied Me- teorology, 3, 285-289.

Koˇsmelj, K. and Batagelj, V. (1990): Cross-sectional approach for clustering time varying data.Journal of Classification 7, 99-109.

Liao, T.W. (2005): Clustering of time series - a survey. Pattern Recognition 38, 1857-1874.

Liao, T.W. (2007): A clustering procedure for exploratory mining of vector time series.Pattern Recognition 40, 2550-2562.

Rousseeuw, P. and Molenberghs, G. (1993): Transformation of non positive semidefnite correlation matrices.Communications in Statistics - Theory and Meth- ods22, 965-984.

Whittle, P. (1963): On the fitting of multivariate autoregressions, and the approximate canonical factorization of a spectral density matrix.Biometrika50, 129-134.

single file

Proceedings of the First International Workshop on

ADVANCED ANALYTICS AND LEARNING ON TEMPORAL DATA

AALTD 2015

September 11, 2015. Porto, Portugal

Ahlame Douzal-Chouakria Jos´ e A. Vilar

Pierre-Franc ¸ois Marteau Ann E. Maharaj

Andr´ es M. Alonso Edoardo Otranto

Irina Nicolae

Proceedings of AALTD 2015

First International Workshop on

“Advanced Analytics and Learning on Temporal Data”

Porto, Portugal

September 11, 2015

Preface

Reviewing Committee

Conference programme schedule

Table of Contents

Capturing Time-structures in Earth Observation Data with Gaussian Processes

Monitoring Short Term Changes of Malaria Incidence in Uganda with Gaussian Processes

1 Introduction

2 Background

3 Method Used

4 Uganda Case

5 Final Remarks

Bag-of-Temporal-SIFT-Words for Time Series Classification

1 Introduction

2 Related work

3 Bag-of-Temporal-SIFT-Words (BoTSW) method

4 Experiments and results

5 Conclusion

Acknowledgments

References

An Exploratory Analysis of Multiple Multivariate Time Series

1 Introduction

2 Methodology

3 Illustration

4 Conclusion

References