Granger-causality and Autoregressive Modelling

Chapter 1 - Introduction

4 Brain Connectivity

4.1 Linear Measures of Functional Connectivity

4.1.3 Granger-causality and Autoregressive Modelling

Partial coherence is different from zero only when the relation between signals is direct. If a signal can be explained by a linear combination of some other signals in the process, the partial coherence between them will be low.

4.1.3 Granger-causality and Autoregressive Modelling

The most widely used measures of directed functional connectivity are the so-called Granger-causal modelling measures, which are based on the concept of Granger-causality [9]. Accordingly, a signal y(n) is said to Granger-cause a signal x(n) if the knowledge of the past values of y(n) reduces the prediction error of the actual values of x(n). In other words, y(n) causes x(n) if the information contained in y(n) helps to better predict x(n) than the information contained in the past values of x(n) only. One should note that this causality is not reciprocal, i.e., the information transfer from y(n) to x(n) is different from the one from x(n) to y(n).

Granger-causality is, thus, based on the assumption that causes precede their effects in time, and therefore, it exclusively uses the past samples in the prediction of the actual values. Due to the low temporal resolution and fluctuations in hemodynamic responses that have no neural origin, the application of Granger-causality measures to fMRI signals is not straightforward [107], however they can be reliably applied to EEG signals.

Granger-causality measures are data-driven and thus, they do not assume any specific underlying model or prior knowledge about the spatial or temporal interactions. They can be investigated with autoregressive (AR) models of an appropriate model order p. The model order defines the number of past time points that are included to estimate the current value. In AR models, the signals are represented as a linear combination of their own past plus additional uncorrelated white noise. Granger-causality between two signals can be

assessed by comparing the univariate with the bivariate AR model fittings. For two signals x1 and x2, the

where a11 and a22 are the model coefficients and e1(n) and e2(n) are the residual noises. Here, the prediction error depends only on the past values of its own signal.

For the bivariate AR model: error for each signal depends on the past values of both signals.

Rewriting the two-signals process in a matrix notation:



X   _R be a stationary R-dimension time series. We then compute the multivariate autoregressive (MVAR) model as a linear combination of the past values of these R signals plus additional Gaussian white noise:

The Aij(m) coefficient describes the linear prediction of the m^th past sample of xj, xj(n-m), on predicting the current value of xi, xi(n), i.e. the causal influence of xj(n-m) on xi(n). Therefore, all coefficient matrices provide knowledge about the directed information flow between all signals in this multivariate process. The coefficients of the MVAR model can be estimated by many algorithms, for instance the stepwise least squares algorithm [108], the multichannel Levinson algorithm (or multichannel Yule-Walker method) [109], or the multivariate Nutall-Strand algorithm [110], among others. The performance of these algorithms was compared, and the Nutall-Strand algorithm was found to perform the best even when the data duration was relatively short [36, 111].

In order to have an adequate MVAR model, it has been recommend that the number of data points across epochs, RN (N – number of data points), should be at least 10 times higher than the number of model parameters R²p (p – model order) [101].

The choice of the model order p can be done using for example the Akaike Information Criterion (AIC) [112] or the Bayesian Information Criterion (BIC) [113]. Both criteria are based on the covariance of the residuals and an additional term penalizing overfitting. Alternatively, the model order can be set in accordance with the maximum expected time-lag of transfer in the process under investigation.

In order to have a spectral measure of Granger-causality, equation (26) is transformed to the frequency

is the Fourier Transform of the coefficients matrices, with A(0) the negative R x R unity matrix.

Equation (28) can also be written as:

) describe some measures to investigate the directed functional connectivity between signals considering either H or A.

where * stands for the conjugate transpose and



_e is the noise covariance matrix.

Out of the AR models parameters, there are several linear directed functional connectivity measures that can be used, either bivariate or multivariate, in the time-domain or in the frequency domain. In the time domain, Granger causality between two signals can be investigated by estimating the bivariate measure Granger-causality index. In the frequency domain, Granger-causality can be studied by using H or A to compute the Directed Coherence, the Directed Transfer Function (DTF) or the Partial Directed Coherence (PDC). Connectivity measures in the frequency domain are important since EEG rhythms oscillate at different frequencies that have different roles in information processing.

In this project, we used a Granger-causal modelling measure, weighted Partial Directed Coherence (wPDC) to investigate direct and directional relationships between EEG source signals.

Time-varying MVAR

If the signals under investigation are highly non-stationary, as in the case of interictal spikes or seizures, MVAR models should be extended to the time-varying case (adaptive MVAR, AMVAR), where the model parameters are time-dependent, and thus, H or A will be then a function of both frequency and time, H(f,t) or A(f,t), respectively. For that, several approaches have been described, namely the use of the short-time windows using the information of multiple trials and requiring the stationarity of the signal within these windows [114], a time-continuous fit of the MVAR model using the Kalman filter approach [115] or the recursive least square (RLS) algorithm with a forgetting factor [116, 117].

In the RLS approach, the influence of the past of the signal decreases exponentially with the time distance to the actual samples. It involves the minimization of the sum of exponentially weighted prediction errors of the signals' past. This weighting (forgetting factor) depends on an adaption constant to ensure that distant past values are "forgotten" so that the algorithm can follow the statistical variations of a non-stationary signal. This was the approach used in our first study. A detailed description of this approach can be found in [116, 117].

Kalman filtering is an algorithm to estimate the state of a state-space model with a system equation and a measurement equation [115]. It estimates the state of a system by a feedback control in which the filter estimates the process state at a given time and then obtains feedback as noisy measurements. It involves two computation phases: predict and update. The predict phase uses the state estimate from the previous time step to produce the estimate at the present time step. In the update phase, the present a priori prediction is combined with the present observation to refine the future estimate.

While the RLS minimizes the prediction errors, the Kalman filter minimizes the error of the estimated parameters by choosing a so-called Kalman gain matrix. For details on the Kalman filter approach we refer the reader to [115, 118].

Both Kalman filter and RLS with forgetting factor approaches include an adaption constant that controls the compromise between the adaptation speed and quality of the estimation: low values lead to a slower adaption with more stable estimations but possibly missing important faster phenomena; if the parameter has too high values, it follows all fluctuations of the signal, resulting in noisy and unstable estimates. It is recommended to have the largest amount of trials possible in order to increase this value, and thus, obtain a higher adaption speed without losing the estimation accuracy. Therefore, this parameter should be chosen carefully depending on the data. Studies have evaluated the influence of the adaptation constant on the performances of DTF and PDC and optimal values have been purposed [119].

4.1.3.1 Granger-causality Index

The bivariate Granger-causality of xj to xi can be derived out of the variances of the prediction errors of the univariate (var(ei) and var(ej)) and of the bivariate (var(eij) and var(eji)) AR models. The Granger-causality index of j to i can be quantified as:











 

 var( )

) ln var(

ij i i

j e

G e (32)

If var(eij)< var(ei) then j Granger-causes i. If the past values of j do not improve the prediction of i, then var(eij)≈var(ei) and G_j__i≈ 0. The Granger-causality index from i to j can be investigated in the same way.

4.1.3.2 Directed Coherence

Directed coherence (DC) was developed to overcome the fact that coherence and partial coherence do not indicate the direction of the information flow [120]. It was first proposed by Saito and Harashima in 1981 [120] for the bivariate case, and later extended to the multivariate case by Baccala and Sameshima in 1998 [121, 122], who proposed to estimate the direction of information flow from one signal j to another i by:





where



_jj are the elements of the covariance matrix



_e. In order to be able to compare the results obtained for the investigation of the information flow between different pair of signals, a normalization is performed by dividing the numerator by the squared sums of all the elements.

4.1.3.3 Directed Transfer Function

The Directed Transfer Function (DTF) was introduced by Kaminski and Blinowska in 1991 [123] and it corresponds to the definition of DC when



_jj 

1

Since DTF only included the transfer function H(f) and not the noise covariance matrix



_e, any possible correlation of input noises should not influence DTF.

4.1.3.4 Partial Directed Coherence

DTF by itself cannot distinguish between indirect or direct interactions, meaning that if there is a direct connection from x1 to x2 and from x2 to x3, DTF will show both the direct connection x1 to x2 and the indirect one from x1 to x3. However, some other formulations of DTF have been purposed to account for this, such as the directed DTF [124].

Partial Directed Coherence (PDC) was introduced by Baccala and Sameshima in 2001 [125]. Besides determining the direction of the information flow, it only detects direct connections between signals, meaning that if there is a direct connection from x1 to x2 and from x2 to x3, PDC will show only these direct connection, in opposition to DTF [126]. PDC is computed in a similar way as DTF, but it uses directly the model coefficients (matrix A) to infer the directed connections and not its inverse (matrix H).

Like DC or DTF, the stability and interpretability of PDC results are reached by normalizing each coefficient of the MVAR model, Aij(f) with respect to either the outflows from the source region or inflows to the receiving region. In the original definition of PDC [125], the outflow from region j to region i, PDCij, is defined as Aij(f) normalized by the sum of all the outflows from region j (column-normalization), and thus, in this formulation, PDCij represents influence of region j on region i compared to the influences of j on all the other regions: DTF or DC, gives more accurate and physiologically plausible results than the column-normalized PDC [127]. Indeed, the normalization by the outflows compromises the sensitivity of the connectivity results (outflows) since an increase of the outflow from one region to another will reduce all the other PDC values outgoing from that region. In the row-wise normalization, the opposite holds: an increase in the inflow to one region will turn all the other PDC values targeting that region smaller. If one is interested on studying the outflow connections, the row-normalization would be a better approach because it allows more variability of the outgoing values given that they are independent from the outflow from the driving region. Row-wise normalization is part of the original DC and DTF definitions and has also been applied to PDC [102, 123, 126]. However, if one is interested on studying the inflows, the column-normalization is preferred. The row-normalized PDC from region j to region i represents then the influence of j on i in respect to all the influences that i has from other regions. At a given frequency f it is given by:





Therefore, high values of PDCij at a given frequency are indicative of a direct influence of j on i. However, the calculation of PDC is independent from the signal spectral power, and thus, there can be high PDC values at frequencies that have very little spectral power, and vice versa. This can make PDC values physiologically hard to interpret. So, in order to quantify the effective influence that the source region exerts, PDC can be weighted by the spectral power of each source signal (weighted PDC, wPDC) [127]:

) ( )

(f sPDC SP f

wPDC_ij  _ij _j (40)

where is the spectral power (SP) of the source region j. Here, the spectral power was computed directly from the signal and not from the model [127]. Therefore, wPDC values will be high only when both sPDC

and SP are high. Weighting by the SP is useful to better identify the predominant frequencies of interaction, and the largest drivers of activity. This is because SP represents physiological observations that are taken into account in the outflow values. A validation study has shown that this weighting gives physiological plausible results in an animal model [127] and in a visual evoked response study [129].

Dans le document Directed functional connectivity in patients with focal epilepsy using high density scalp EEG (Page 40-46)