Variance/Covariance extension for time series discrimination

(1)

HAL Id: hal-00744747

https://hal.archives-ouvertes.fr/hal-00744747v2

Submitted on 6 May 2013

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Variance/Covariance extension for time series

discrimination

Cédric Frambourg, Ahlame Douzal-Chouakria, Éric Gaussier, Jacques

Demongeot

To cite this version:

(2)

Variance/Covariance extension for time series

discrimination

C. Frambourg1,2_{, A. Douzal-Chouakria}1_{, E.Gaussier}1_{, J. Demongeot}2

1- UJF-Grenoble 1/CNRS, Universit de Grenoble, LIG UMR 5217/AMA team 2- UJF-Grenoble 1/CNRS, TIMC-IMAG UMR 5525.

(Cedric.Frambourg, Ahlame.Douzal, Eric.Gaussier)@imag.fr

Abstract. For time series discrimination,the main idea behind the pro-posed approach is to use a variance/covariance criterion to strengthen or weaken aligned observations according to their contribution to the variability within and between classes. To this end, the classical vari-ance/covariance expression is extended to a set of time series, as well as to a partition of time series.

Keywords: time series alignment, classification, variance-covariance

1 Introduction

Time series originating from the same (or similar) sources are often noisy as the timing of salient events can be extremely variable. For example, in the con-text of electric networks, a particular peak associated with the same underlying event may appear at different times, depending on the use of the plugs moni-tored. To allow time series comparison while dealing with time delays, numerous alignments strategies have been proposed, as the ones based on Dynamic Time Warping (DTW) [1], which however proposes a too local view as the alignment depends only on the couple of time series under consideration; furthermore, the process of alignment is decoupled from the one of analysis (as clustering or clas-sification), weakening the use of the alignment in real applications.

To partly overcome these problems, Gaffney et al. [2] propose a probabilistic framework to jointly handle the clustering and the alignment processes. How-ever, the proposed alignments are limited to time series of a same class so that the discriminative power of the method is limited. In Listegarten et al. [3], a hierarchical Bayesian model is proposed to perform detection of rare differences between classes of time series. This model allows one to align time series si-multaneously across all classes, while detecting and characterizing class-specific differences. Ramsay et al. [4] propose a time series clustering model where an alignment function is learned for each time series, parameterized with order one B-spline coefficients. The learned alignments account for a common shared structure within clusters.

(3)

most differential features between classes. We propose to do so here through the use of a variance/covariance criterion to strengthen or weaken links accord-ing to their contributions to the variances within and between clusters. The variance/covariance measure is a classical criterion, used in many approaches, including discriminant analysis, dimensionality reduction, clustering and classi-fication, and variants of it have already been proposed for graph-structured data (see for example [5–8]). Its use for learning alignments between time series has however never been investigated before, to our knowledge. We propose an exten-sion of the classical variance/covariance expresexten-sion to a set of time series, then to a partition based on classes of time series. Based on the learned alignments, a discriminative distance is defined for time series nearest neighbor classification.

2 The variance/covariance of time series data

We first recall here the definition of the conventional variance/covariance matrix, prior to extend it to a set of time series and then to a partitioned (according to classes) set of time series. Let X be the (n × p) data matrix providing the description of n observations by p numerical variables. The conventional (p × p) variance/covariance matrix expression is:

V = Xt(I − U P )tP (I − U P )X (1) where, I is the diagonal identity matrix, U the unit matrix, and P a diagonal matrix of weights, generally set to pi= 1_n for equally weighted observations.

In the case of a set of time series, let X be the (nT × p) matrix providing the description of n multivariate time series S1, ..., Sn by p numerical variables

at T time stamps. The general term xl_ij of X gives the value of the variable Xj (j = 1, ..., p) taken by Sl (l = 1, ..., n) at the ith time stamp (i = 1, ..., T ).

Alignments between n time series can be encoded through a matrix M composed of n2 _{block matrices M}ll0 _{(l = 1, ..., n; l}0 _{= 1, ..., n). A block M}ll0 _{is a (T × T )}

matrix that specifies the alignment between Sl and Sl0, and its general term

mll0

ii0 ∈ [0, 1] indicates the intensity of the linkage between the observation of Sl

at time i and the observation of Sl0 at time i0.

3 The variance induced by a partition of time series

Let us now consider the set of time series S1, ..., Sn partitioned into K groups,

with yi∈ {1, ..., K} the class label of Siand nkthe size of class k (i.e. the number

(4)

Similarly, the between variance (i.e. the variance between classes) can be de-fined as follows. The between variance with an inter-class alignment matrix M . The general form of the alignment matrix M is symmetric wrt to the preced-ing one, alignments between time series of the same class bepreced-ing forbidden this time, whereas alignments between time series of different classes are taken into account.

As one can note, alignments between time series play a crucial role (through the intra and inter class alignment matrices) in the definition of the within and between variances. To discriminate time series, the question which thus arises is how to learn such matrices so as to be able to minimize the within variance and maximize the between variance.

4 Conclusion

For time series discrimination, the main idea is based on strengthening or weak-ening links according to their contribution to the variability within and between classes. To this end, the classical variance/covariance expression is extended to a set of time series, and to a partition of time series. For time series classification, a new time series discriminative distance based on the learned alignments can be proposed.

References

1. J. Kruskall and M. Liberman, The symmetric time warping algorithm: From con-tinuous to discrete. In Time Warps, String Edits and Macromolecules. Addison-Wesley., 1983.

2. S. J. Gaffney and P. Smyth, “Joint probabilistic curve clustering and alignment,” In Advances in Neural Information Processing Systems, vol. 17, pp. 473–480, 2005. 3. J. Listgarten, R. Neal, S. Roweis, R. Puckrin, and S. Cutler, “Bayesian detection of infrequent differences in sets of time series with shared structure,” Neural Informa-tion Processing Systems, vol. 19, no. 905–912, 2007.

4. J. Ramsay and X. Li, “Curve registration,” Journal of the Royal Statistical Society, vol. B, no. 60, pp. 351–363, 1998.

5. W. D., “Multivariate spatial correlation: A method for exploratory geographical analysis,” Geographical Analysis, vol. 17, no. 4, pp. 263–283, 1985.

6. T. A. Banet and L. Lebart, “Local and partial principal component analysis and correspondence analysis,” Computational Statistics, pp. 113–118, 1984.

7. T. A. Banet, “Local and partial correspondence analysis: application to the analysis of electoral data,” Computational statistics quarterly, vol. 2, pp. 89–103, 1988. 8. J. Thioulouse, D. Chessel, and S. Champely, “Multivariate analysis of spatial