Modeling electrocardiogram using Yule-Walker equations and kernel machines

(1)

HAL Id: hal-01966017

https://hal.archives-ouvertes.fr/hal-01966017

Submitted on 27 Dec 2018

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Modeling electrocardiogram using Yule-Walker

equations and kernel machines

Maya Kallas, Clovis Francis, Paul Honeine, Hassan Amoud, Cédric Richard

To cite this version:

Maya Kallas, Clovis Francis, Paul Honeine, Hassan Amoud, Cédric Richard. Modeling

electrocar-diogram using Yule-Walker equations and kernel machines. Proc. 19th International Conference on

Telecommunications (ICT), 2012, Jounieh, Lebanon. pp.1-5, �10.1109/ICTEL.2012.6221217�.

�hal-01966017�

(2)

Modeling electrocardiogram using Yule-Walker

equations and kernel machines

Maya Kallas

∗†

, Clovis Francis

∗

, Paul Honeine

†

, Hassan Amoud

‡

and C´edric Richard

§

∗_{Laboratoire d’Analyse et de Surveillance des Syst`emes (LASYS), Faculty of Engineering, Lebanese University, Lebanon} †_{Institut Charles Delaunay (UMR CNRS 6279), Universit´e de Technologie de Troyes, France}

‡_{Azm Center for Research in Biotechnology and its Applications, Lebanese University, Lebanon} §_{Laboratoire Fizeau (UMR CNRS 6525), Universit´e de Nice Sophia-Antipolis, France}

Abstract—One may monitor the heart normal activity by

analyzing the electrocardiogram. We propose in this paper to combine the principle of kernel machines, that maps data into a high dimensional feature space, with the autoregressive (AR) technique defined using the Yule-Walker equations, which predicts future samples using a combination of some previous samples. A pre-image technique is applied in order to get back to the original space in order to interpret the predicted sample. The relevance of the proposed method is illustrated on real electrocardiogram from the MIT benchmark.

Index Terms—kernel machines, ECG signals, autoregressive

model, nonlinear models, pre-image problem

I. INTRODUCTION

Heart disease remains the main cause of death nowadays. There are many different forms of heart disease, such as the heart failure, the narrowing or blockage of the coronary arter-ies. One way to monitor the heart is the electrocardiography. The record, which is called the electrocardiogram (ECG), shows the series of waves that relate to the electrical impulses which occur during each beat of the heart, and can be detected at the surface of the body. Many studies have been made in order to detect and classify the heart diseases of the ECG signals [1], [2], and [3].

An ECG is a time series corresponding to the electrical activity of the heart. The autoregressive (AR) model is one of the most useful technique for time series analysis [4]. Its concept is the prediction of a sample using a combination of some previous ones. Being a linear combination, the model is thus defined by the coefficients, or the weights in the linear expansion. In order to use such model, one has to identify its model parameters that are the coefficients and its order which is the number of the previous samples. A technique used to estimate its coefficients is the Yule-Walker equations. It is based on the evaluation of the autocorrelation function. Still, this linear technique is inappropriate for nonlinear systems such as modeling ECG.

In order to extend the AR to the case of nonlinear systems, we propose to apply the concept of the kernel methods. The main idea behind using the kernels is to map the data from an input space to a feature space using a nonlinear mapping function. In practice, it is not necessary to explicitly determine the mapping function, one can transform a linear algorithm

into a nonlinear one by applying the kernel trick [5]. Initially introduced with Vapnik’s Support Vector Machines (SVM) [6], the use of kernels proliferated in machine learning due to their low computational cost with high performances. Recently, some kernel-based methods were considered for the analysis and prediction of time series data [7], including the SVM for regression and kernel Kalman filter [8].

In this paper, we propose to combine the concept of the ker-nel machine with the simplicity of the AR model for modeling real ECG signal. The kernel AR is determined using the Yule-Walker equations in the feature space. Thus, parameters are estimated using the (lagged) expected kernels. To this end, we use a Gaussian kernel with the AR model estimated using the Yule-Walker equations. Eventhough, the kernel AR estimates the samples in some feature space. Therefore, being in a high dimensional space, one cannot interpret the sample. Thus, it is required to map back the data from the feature space to the input space where the sample can be interpreted. This necessitate the use of a pre-image technique to predict the future sample into a space where it can be observed. Many techniques have been presented in literature, in order to solve the pre-image problem. See [9] for a recent review. The paper’s authors have solved the pre-image problem for biomedical signal processing in [10], [11], [12].

The rest of the paper is organized as follows: in the next section, we define the Yule-Walker equations used to estimate the coefficients of the AR model. In section III, we present the principle of the kernel methods in machine learning, and we extend the use of the AR model to the nonlinear systems. A pre-image technique is described in section IV in order to estimate the future sample. Section V illustrates the efficiency of the proposed technique on some real ECG signal taken from the MIT-BIH Normal Sinus Rhythm Database.

II. YULE-WALKER EQUATIONS FORARMODEL

A real-valued stochastic processxi is said to be an

autore-gressive process of orderp, denoted by AR(p) if there exists α1, . . . , αp∈ IR, with αp6= 0, and a white noise i such that

xi= p X j=1

(3)

x_i x_i−1 x_i−(p−1) x_i−p α1 αp αp−1

Fig. 1. Illustration of the AR concept, where the xi is defined using the parameters αkand the p previous xi−k.

Therefore, the set xp+1, xp+2, . . . , xn defines a time series, described by the AR model of order p. The concept of the AR model is illustrated in figure 1. Several techniques in the literature allow to determine the parametersαk. One of them is the resolution of the Yule-Walker equations. They entail the recursive computation of the autocorrelation functionρ of the AR(p)-process. The recursion ρ(s) = Pp_j=1αjρ(s − j), for lag s > 0 refers to the Yule-Walker equations. Having ρ(−s) = ρ(s), and taking the empirical counterparts r(j) of the autocorrelationρ(j), we get a more formal equation

r_{= Rα,} where r= [r(1) . . . r(p)]> , α= [α1. . . αp]> , and R₌      1 r(1) . . . r(p − 1) r(1) 1 . . . r(p − 2) .. . . .. ... r(p − 1) r(p − 2) . . . 1      .

When the p × p matrix R is invertible, we can rewrite the formal equation as R−1r_{= α.}

III. ARMODEL WITH KERNEL MACHINES

A. Kernel methods

The aforementioned techniques are used in the case of linear data. In order to extend their use to the nonlinear case, we propose the kernel machines. Kernel functions are typically viewed as providing an implicit mapping of points into a high-dimensional space, with the ability to gain much of the power of that space.

LetX be an input space with the canonical (Euclidean) dot productxi· xj for anyxi, xj ∈ X . Let κ : X × X 7→ H be a symmetric and continuous function, i.e., a kernel. A kernel is positive semi-definite if and only if a matrix K with entries κ(xi, xj) for any finite subset of X is positive semi-definite, that is P

i,jαiαjκ(xi, xj) ≥ 0 for all αi, αj ∈ IR and all xi, xj ∈ X . Based on the Moore-Aronszajn theorem [13], any positive semi-definite kernel guarantees the existence of a unique1 feature space (or reproducing kernel Hilbert space) H, where κ defines an inner product. Let Φ(·) denotes the

1_{Unique, up to an isometry.}

mapping function from the input space X , into the feature spaceH, then

κ(xi, xj) = hΦ(xi), Φ(xj)iH,

for anyxi, xj ∈ X , where h· , ·iH denotes the corresponding

inner product in H, and k · kH its norm.

B. Kernel AR model

Using the concept of the kernel machines, and applying it to the data of some time series, each xi of the series is mapped into the feature space using the mapping function Φ(·), yielding to Φ(xi). So in the case of the AR model, when extending it to the nonlinear systems, and applying the kernel machines, it will be defined in the feature space by

Φ(xi) = p X j=1

αjΦ(xi−j) + Φi .

Figure 2 illustrates the concept of the kernel AR model. Let µ be the mean of the mapped time series, namely

µ = IE[Φ(xi)],

where IE[·] is the expectation of Φ(xi) over all xi’s. Then,

Φ(xi) − µ = p X j=1 αjΦ(xi−j) + Φ i − µ = p X j=1 αj Φ(xi−j) − µ + Φ i − 1 − p X j=1 αjµ.

With the expectation of both sides, the times series being assumed stationary, we get (1 −Pp_j=1αj)µ = IE[Φ

i ]. By taking the centered data (Φ(xi−τ) − µ), with lag τ > 0, we get

hΦ(xi) − µ, Φ(xi−τ) − µiH = h

Φ i − IE[ Φ i], Φ(xi−τ) − µiH + p X j=1

αjhΦ(xi−j) − µ, Φ(xi−τ) − µiH.

(2) Using the kernel trick, we replace the dot product in the feature space by some kernel function, we define the centered version of a kernelκ(·, ·) with

κc(xi, xj) = hΦ(xi) − µ, Φ(xj) − µiH.

By analogy with the linear AR case, we assume that the noiseΦ

i andΦ(xi−τ) are uncorrelated for every positive lag τ . Therefore, by taking the expectations of expression (2) and assuming the stationarity, we get for anyτ ≥ 1:

IE[κc(xi, xi−τ)] = p X j=1

αjIE[κc(xi−j, xi−τ)], (3)

where the notion of expected kernels is equivalent to the one recently studied in [14]. By considering all the lag values, expression (3) is written in matrix form

(4)

xi−1 xi−2 xi−(p−1) xi−p x∗ i Φ(·) Φ(xi−1) Φ(xi−2) Φ(xi−(p−1)) Φ(xi−p) αp αp−1 α1 α2 ψi= p X j=1 αjΦ(xi−j) ? X H

Fig. 2. Illustration of the Kernel AR model concept, for which, the data are mapped from X to H, where the linear AR concept is applied on the mapped data. However, once ψiis evaluated, a mapping back from H to X is needed to interpret xi.

where

r_κ₌h_{IE[κc(xi, xi−1)] IE[κc(xi, xi−2)] · · · IE[κc(xi, xi−p)]}i >

, and Rκ is the matrix described by the expected kernels with 

   

IE[κc(xi, xi)] IE[κc(xi−2, xi−1)] · · · IE[κc(xi−p, xi−1)] IE[κc(xi−1, xi−2)] IE[κc(xi, xi)] · · · IE[κc(xi−p, xi−2)]

..

. ... . .. ...

IE[κc(xi−1, xi−p)] IE[κc(xi−2, xi−p)] · · · IE[κc(xi, xi)]     

The vector of coefficients α is can be calculated by inverting the matrix Rκ, with

α_{= R}−1_κ r_κ.

In practice, the expectations are estimated over a set of n available samples. The centered version of the kernel is determined using κc(xi, xj) = κ(xi, xj) −1 n n X k=1 κ(xi, xk) − 1 n n X k=1 κ(xj, xk) +1 n2 n X k,k0₌₁ κ(xk0, xk).

IV. PRE-IMAGE TECHNIQUE AS A PREDICTION OF FUTURE SAMPLES

When the learning stage has been completed with some n available samples, the model parameters are estimated. The AR model in feature space can thus estimate the future sample by ψi = p X j=1 αjΦ(xi−j),

starting with i = n + 1. This evaluation of p samples in the feature space gives a sample in such space. Nonetheless, in a high space, this sample can not be interpreted. Thus in order to predict a future sample in the original space, one needs to map back ψi from the feature space into the input space. Nevertheless, a combination can lead to sample outside the image space, as shown in figure 2, thus, the sample may not have a pre-image, or if it exists it may not be unique. This is the pre-image problem, where one seeks to find an approximate solution in the input space, whose image by the nonlinear function is as close as possible to the sample estimated in the feature space. Several techniques have been proposed to solve this ill-posed problem. This includes a multidimensional scaling technique, a conformal map, and a learning scheme. See [9] for a formal definition of the pre-image problem, and a recent survey of the literature.

As aforementioned, the pre-image problem is solved when we can find an approximate solution x∗

i whose image Φ(x ∗₎ is as close as possible toψi. This is defined by the following optimization problem: x∗ i = arg min_x 1 2 Φ(x) − p X j=1 αjΦ(xi−j) 2 H .

In general, we can write it as x∗i = arg min x Ji(x), with Ji(x) = − p X j=1 αjκ(xi−j, x) +1 2κ(x, x), (4) where the term independent ofx has been removed.

One technique used to solve such optimization problem is the fixed-point iterative technique. Based on the gradient

(5)

TABLE I

MSECALCULATED FOR THE LINEARARMODEL AND THE KERNELARMODEL USING THEYULE-WALKER EQUATIONS TO MODELECGSIGNALS,AND THE MEAN OF THEMSEOVER THE10SIGNALS.

Signal number 1st 2nd 3rd 4th 5th 6th 7th 8th 9th 10th Mean of MSE

over the 10 signals Kernel AR 0.00056 0.00030 0.00021 0.00027 0.00014 0.00022 0.00018 0.00039 0.00021 0.00059 0.000313 Linear AR 0.0396 0.0091 0.0631 0.0210 0.0203 0.0268 0.0619 0.221 0.0169 0.0077 0.04874 0 50 100 150 −1 −0.5 0 0.5 1 1.5 Original Kernel Linear 0 50 100 150 −1 −0.5 0 0.5 1 1.5 2 2.5 3 Original Kernel Linear

Fig. 3. The modeling of 2 signals, using the kernel AR method, and the linear AR method. The blue signal presents the original one, the red is the one modeled using the kernel method, and the cyan is the one modeled using the linear AR.

descent scheme, and the type of kernel used, we can imple-ment an iterative scheme to identify the pre-image [15]. The Gaussian kernel is the most used one. It is defined by

exp(−kxi− xjk2/2σ2),

for anyxi, xj∈ X , where σ defines its bandwidth parameter. The resulting cost functionJi(x) is defined by

− p X j=1

αjexp(−kxi−j − xk2/2σ2).

Taking the derivative of the above expression with respect tox, and setting it to zero, we get the fixed-point iterative expression

x∗i,t+1 = Pp j=1αjexp(−kxi−j − x ∗ i,tk2/2σ2) xi−j Pp j=1αjexp(−kxi−j− x ∗ i,tk2/2σ2) . V. EXPERIMENTATIONS

We applied the proposed method to predict future samples of the ECG signals on some real ones taken from the MIT-BIH Normal Sinus Rhythm Database. The ECG signals are considered stationary, foe they are taken in a 1 minute of time. To this end, the gaussian kernel was used. The experiments were divided into two stages: learning and testing. 150 samples from each signal were used in the learning stage, where the order and the coefficients of the AR model and the bandwidth of the kernel were optimized. The same order for each signal

is taken for both the linear AR and the kernel AR models. The following 150 samples were used to test the relevance of the proposed kernel AR model, to the linear AR model where its coefficients are estimated using a linear Yule-Walker equations.

A comparison between the kernel AR method, and the linear AR model has been made. The comparison is evaluated based on the calculation of mean square error estimated on the 150 samples of the test stage. The mean square error (MSE) was estimated with ε = 1 n 2n X i=n+1 kx∗i − xik2, wherex∗

i is the predicted value at instanti, and xi is the true value of the ECG signal at the same time. Table I shows the MSE for the kernel AR method and the linear AR method for the ECG signals, while the last column shows the mean of the MSE over the 10 signals. As we can see, the kernel AR model gives better results than the linear AR model. The MSE in the first case is approximately 100 times less than in the linear AR case.

Figure 3 shows two examples of modeling signals for normal cardiac rhythm, on the one hand, using the kernel AR technique estimated with the Yule-Walker equations in the feature space, and on the other, using the linear AR technique

(6)

evaluated with the Yule-Walker equations. The blue line is the original ECG signal, the red is the one modeled using the proposed kernel AR method, and the cyan is the one modeled using the linear AR. As we can see, the proposed method presents a more appropriate modeling than the linear AR method. The linear AR method fails to model the QRS complex of the heart beat, while the kernel AR method models the complex for each person based on his heart rate.

VI. CONCLUSION

Heart disease is one of the main death causes these days. Yet by analyzing and monitoring the ECG signals, one can predict when an arrythmia may occur. In this paper, we have presented a way to predict the evolution of the electrical activity of the heart using kernel machines. To this end, we have combined the concept of the AR model, which is a linear way to predict sample for time series, with the kernel methods in machine learning, where the data are mapped into a high dimensional feature space. We have defined the kernel AR in feature space using the Yule-Walker equations in order to estimate its coefficients. Once the future samples are estimated using this technique, a pre-image technique is required to map the data back to the input space, where the samples can be interpreted. The pre-image technique used was the fixed-point iterative method. The efficiency of the proposed method is illustrated on some real ECG signals taken from the MIT-BIH Normal Sinus Rhythm Database.

As for future work, we consider the estimation of the AR model using the partial autocorrelation function, or the Bayesian Information Criterion. Also, we intend to extend the use of the kernel machines to the autoregressive moving average or autoregressive integrated moving average model.

ACKNOWLEDGMENT

This work is partly supported by the Lebanese University and the French-Lebanese research program CEDRE No. 10 SCI F15/L5.

REFERENCES

[1] T. Linh, S. Osowski, and M. Stodolski, “On-line heart beat recognition using hermite polynomials and neuro-fuzzy network,” IEEE Transac-tions On Instrumentation And Measurement, vol. 52, no. 4, August 2003. [2] S. Osowski, L. T. Hoai, and T. Markiewicz, “Support vector machine-based expert system for reliable heartbeat recognition,” IEEE Transac-tions On Biomedical Engineering, vol. 51, no. 4, Avril 2004. [3] W. Kao, C. Yu, C. Shen, W. Chen, and P. Hsiao, “Electrocardiogram

analysis with adaptive feature selection and support vector machines,” in IEEE Asia Pacific Conference on Circuits and Systems, APCCAS, Singapore, 4-7 December 2006, pp. 1783–1786.

[4] M. Chevalier and Y. Grenier, “Autoregressive models with time-dependent log area ratios,” in Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP ’85., vol. 10, apr 1985, pp. 1049 – 1052.

[5] M. A. Aizerman, E. A. Braverman, and L. Rozonoer, “Theoretical foundations of the potential function method in pattern recognition learning,” in Automation and Remote Control, no. 25, 1964, pp. 821– 837.

[6] V. N. Vapnik, Statistical Learning Theory. Wiley-Interscience, Septem-ber 1998.

[7] C. Richard, J. C. M. Bermudez, and P. Honeine, “Online prediction of time series data with kernels,” IEEE trans. on Signal Processing, vol. 57, no. 3, pp. 1058–1067, March 2009.

[8] L. Ralaivola and F. D’alche-Buc, “Time series filtering, smoothing and learning using the kernel kalman filter,” in Proc. IEEE International Joint Conference on Neural Networks, vol. 3, 2005, pp. 1449–1454. [9] P. Honeine and C. Richard, “The pre-image problem in kernel-based

machine learning,” IEEE Signal Processing Magazine, special issue on “dimensionality reduction via subspace and manifold learnin”, vol. 28 (2), March 2011.

[10] M. Kallas, P. Honeine, C. Richard, H. Amoud, and C. Francis, “Non-linear feature extraction using kernel principal component analysis with non-negative pre-image,” in Proc. 32nd Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Buenos Aires, Argentina, 31 Aug. - 4 Sept. 2010.

[11] M. Kallas, P. Honeine, C. Richard, C. Francis, and H. Amoud, “Non-negative pre-image in machine learning for pattern recognition,” in 19th European Signal Processing Conference, Barcelona, Spain, 29 August -2 September -2011.

[12] M. Kallas, P. Honeine, H. Amoud, and C. Francis, “Sur le problème de la pré-image en reconnaissance des formes avec contraintes de non-négativité,” in Colloque GRETSI’2011, Bordeaux, France, 5-8 Septem-bre 2011.

[13] N. Aronszajn, “Theory of reproducing kernels,” Trans. Amer. Math. Soc., vol. 68, pp. 337–404, 1950.

[14] H. S. Anderson and M. R. Gupta, “Expected kernel for missing features in support vector machines,” in IEEE Workshop on Statistical Signal Processing, Nice, France, 28-30 June 2011.

[15] S. Mika, B. Schölkopf, A. Smola, K.-R. Müller, M. Scholz, and G. Rätsch, “Kernel PCA and de-noising in feature spaces,” in Pro-ceedings of the 1998 conference on Advances in neural information processing systems II. Cambridge, MA, USA: MIT Press, 1999, pp. 536–542.