Thesis
Reference
Time-frequency Granger causality with application to nonstationary brain signals
CEKIC, Sezen
Abstract
This PhD thesis concerns the modelling of time-varying causal relationships between two signals, with a focus on signals measuring neural activities. The ability to compute a dynamic and frequency-specific causality statistic in this context is essential and Granger causality provides a natural statistical tool. In Chapter 1 we propose a review of the existing methods allowing one to measure time-varying frequency-specific Granger causality and discuss their advantages and drawbacks. Based on this review, we propose in Chapter 2 an estimator of a linear Gaussian vector autoregressive model with coefficients evolving over time. Estimation procedure is achieved through variational Bayesian approximation and the model provides a dynamical Granger-causality statistic that is quite natural. We propose an extension to the `a trous Haar decomposition that allows us to derive the desired dynamical and frequency-specific Granger-causality statistic. In Chapter 3 we propose an application of the model to real experimental data.
CEKIC, Sezen. Time-frequency Granger causality with application to nonstationary brain signals. Thèse de doctorat : Univ. Genève, 2015, no. GSEM 18
URN : urn:nbn:ch:unige-789688
DOI : 10.13097/archive-ouverte/unige:78968
Available at:
http://archive-ouverte.unige.ch/unige:78968
Disclaimer: layout of this document may differ from the published version.
1 / 1
Time-Frequency Granger Causality
with Application to
Nonstationary Brain Signals
by
Sezen Cekic
A thesis submitted to the
Geneva School of Economics and Management, University of Geneva, Switzerland,
in fulfillment of the requirements for the degree of PhD in Statistics
Members of the thesis committee:
Prof. Olivier Renaud, Adviser, University of Geneva Prof. Didier Grandjean, Adviser, University of Geneva Prof. Maria-Pia Victoria-Feser, Chair, University of Geneva Prof. Anthony Davison, ´Ecole Polytechnique F´ed´erale de Lausanne
Thesis No. 18 December 2015
Acknowledgements
A l’ach`` evement de ce travail de th`ese, je suis ´evidemment convaincue que le cheminement est loin d’avoir ´et´e solitaire. Je n’aurais jamais pu r´ealiser ce travail sans le soutien d’un grand nombre de personnes dont la g´en´erosit´e, la bonne humeur et la patience `a mon
´
egard m’ont permis d’avancer et de progresser dans ce tr`es curieux environnement mi- jungle mi-d´esert qu’est celui du travail doctoral.
Tout d’abord, je tiens `a remercier chaleureusement mes deux directeurs de th`ese et mentors acad´emique.
Le Professeur Olivier Renaud, qui m’a fait confiance et m’a pris sous son aile d`es le d´ebut de mon travail de master et qui sans relˆache a ´et´e `a mes cˆot´es, bienveillant et pilier de ce travail de th`ese. Son caract`ere doux et conciliant, sa sympathie, sa profonde humilit´e ainsi que ses immenses qualit´es scientifiques m’ont donn´e confiance et m’ont permis d’avancer jusqu’ici.
Le Professeur Didier Grandjean ensuite, qui a cru en moi d`es le d´ebut en me pro- posant un challenge impossible pour mon travail de master et qui m’a propos´e ensuite de poursuivre avec une th`ese. Son ´energie, son inventivit´e et sa tr`es grande intelligence ont beaucoup contribu´e `a rythmer ce travail et je lui suis extrˆemement reconnaissante de m’avoir mise sur les voies de la carri`ere acad´emique.
J’ai eu beaucoup de chance et de plaisir `a travailler aux cˆot´es de mes deux directeurs de th`ese et j’esp`ere n’en ˆetre qu’au d´ebut de nos collaborations scientifiques.
Je remercie la pr´esidente du jury, la Professeure et Doyenne Maria-Pia Victoria-Feser.
Je la remercie tout d’abord chaleureusement d’avoir soutenu ce projet d`es ses tout d´ebuts en acceptant d’en ˆetre la co-directrice. Je la remercie ensuite de m’avoir tendu la main au moment critique o`u j’en avais besoin en m’ouvrant les portes de la nouvelle GSEM.
Je remercie sinc`erement le Professeur Anthony Davison de nous avoir fait l’honneur d’ˆetre le jury de ce travail de th`ese. Je le remercie pour sa lecture extrˆemement attentive et ses pr´ecieux commentaires qui ont sans aucun doute beaucoup am´elior´e la qualit´e du manuscrit.
Je remercie enfin mon coll`egue Roberto Molinari d’avoir accept´e de relire enti`erement ma th`ese et d’y apporter les corrections n´ecessaires `a mon anglais de basse-cour.
L’ach`evement de cette th`ese me fait ´evidemment penser `a mes anciens coll`egues de master, William, James et Dravasp qui m’ont ´enorm´ement aid´ee `a combler mes tr`es nom- breuses lacunes au d´ebut de ma carri`ere en statistique. Merci `a eux pour leur aide sans laquelle je n’aurais tr`es certainement pas surv´ecu `a ce moment-l`a.
Ces ann´ees de th`ese et les diff´erentes ´equipes cˆotoy´ees m’ont permis de rencontrer bon nombre de gens que je tiens `a remercier ici. Tout d’abord mes coll`egues du NEAD et du CISA, Kim, Alison, Julie, Val´erie, L´eo, Andy, Nico et tous les autres pour ces premi`eres
ann´ees de th`ese remplies de bonne humeur, de congr`es et de soir´ees en tous genre. Merci ensuite `a mes coll`egues de la GSEM, `a ´Elise, Rose, Roberto, Samuel, Kustrim, Marc–
Olivier et Mattia de m’avoir chaleureusement accueuillie au sein de leur ´equipe et du Rob’in Caf´e. Merci `a eux pour leur bonne humeur et leur camaraderie qui ont fait beau- coup `a la p´eriode doctorale `a laquelle je me trouvais au moment o`u je suis arriv´ee chez eux. Merci finalement `a mes adorables coll`egues du MAD, `a Eda, ´Emilie, Emma, Nad`ege, Catherine, Jaromil, Boris, Marc, Julien et Paolo pour leur accueil au sein de leur ´equipe.
Je ne pouvais pas esp´erer mieux que leur compagnie pour finir ma th`ese.
Je ne serai bien-sˆur jamais venue `a bout de ce travail sans l’aide et le soutien de tous mes proches qui me sont si chers et que je tiens `a remercier ici. Tout d’abord merci `a mes amis, particuli`erement `a Melia, Vanille, Clara, Dragana, Antonin, L´eo et `a tous les autres bien-sˆur, pour leur amiti´e ind´efectible, leur soutient et surtout pour tous ces moments vitaux partag´es avec eux tout au long de ces ann´ees.
Merci ensuite `a ma belle-famille, `a Fabienne, Alan, Carla, L´eonard, Marine, Thomas, C´esar et James pour leur pr´esence si ch`ere et si importante dans ma vie.
Un merci particulier `a mes deux merveilleuses amies Aur´elie et Malvina pour leur incroyable pr´esence, leur amour et leur soutient pendant toutes ces ann´ees. Merci `a elles de s’int´eresser pareillement `a ce que je fais et d’avoir compris, support´e et remont´e tous mes ´etats d’ˆame depuis mes tout d´ebuts dans ce monde de la statistique.
Merci ensuite `a ma famille, `a mon oncle Ariel et ma grand-m`ere Gabriele, de m’avoir appris tant de choses, d’ˆetre `a mes cˆot´es et de croire en moi depuis depuis toutes ces ann´ees.
Merci `a ma sœur Sevline, `a mon beau-p`ere Francis et `a ma maman Oria d’avoir toujours ´et´e l`a pour moi, d’avoir cru en moi d`es le d´ebut, de m’avoir laiss´ee ˆetre moi, faire mes propres choix sans jamais les juger mais en tentant plutˆot de comprendre en ´etant pr´esents et aimants `a mes cˆot´es. Merci `a ma maman d’avoir ´et´e un tel exemple de vie et de m’avoir appris tr`es vite que quand on veut, on se donne les moyens, et on peut.
Finalement merci `a mon amour Arthur d’ˆetre l`a depuis le d´ebut, merci d’avoir soutenu,
´
ecout´e, recadr´e, encourag´e, calm´e, reboost´e et j’en passe! Merci d’avoir tout support´e, d’avoir compris ce que c’´etait et ce que cela impliquait de faire une th`ese et surtout merci de m’avoir aid´ee tout au long de cette aventure `a sans cesse remettre les vraies choses `a leur vraie place. Merci `a lui d’avoir v´ecu tout cela `a mes cˆot´es et de m’avoir donn´e tant de force `a travers sa bonne humeur, son ´energie et sa joie de vivre hors du commun.
Abstract
This PhD thesis concerns the modelling of time-varying causal relationships between two signals, with a focus on signals measuring neural activities. Indeed, in an experimental context, data are recorded during an experimental situation where stimuli are presented at fixed times and are expected to induce a reaction. The causal links between recorded signals may therefore vary in time and be frequency-specific. The ability to compute a dynamic and frequency-specific causality statistic is essential. Granger causality provides a natural statistical tool allowing us to investigate possible causal relationships between two signals, its properties being assessed a few decades ago (Granger,1969).
In Chapter 1 we propose a review of the existing methods allowing one to measure time- varying frequency-specific Granger causality and discuss their advantages and drawbacks.
Based on this review, we propose in Chapter 2 the use of an adaptive Kalman filter type of estimator of a linear Gaussian vector autoregressive (LGVAR) model with coeffi- cients evolving over time. Estimation procedure is achieved through variational Bayesian approximation (Beal, 2003). This Bayesian State Space model (BSS model), proposed in Cassidy (2002) provides a dynamical Granger-causality statistic that is quite natural.
The Bayesian nature of the model provides a criterion for model order selection and al- lows us to include prior knowledge. We propose to extend the BSS model to include the
`
a trous Haar decomposition. We call this new model the multiscale Bayesian state space model (MSBSS model). This wavelet-based forecasting method based on a multiple reso- lution decomposition of the signal using the redundant `a trous wavelet transform allows us to capture short- and long-range dependencies between signals and further allows us to derive the desired dynamical and frequency-specific Granger-causality statistic.
Finally in Chapter 3 we propose an application of the MSBSS model to intracra- nial electroencephalogram data recorded in the amygdala and medial orbitofrontal cortex brain regions during experimental emotional auditory stimuli recognition. We present preliminary results for this work, which is in progress.
R´ esum´ e
L’objet de recherche de cette th`ese de doctorat concerne la mod´elisation des relations causales dynamiques entre deux signaux avec une application sp´ecifique aux signaux mesurant une activit´e c´er´ebrale.
Dans un contexte exp´erimental, les stimuli sont pr´esent´es `a instants fixes et l’on s’attend `a ce qu’ils provoquent une r´eaction chez le sujet. Pour les chercheurs, ˆetre en mesure de d´ecrire le lien causal existant entre les signaux enregistr´es dans les diff´erentes r´egions du cerveau est d’un int´erˆet premier. Les liens de causalit´e entre les signaux enreg- istr´es pouvant varier dans le temps et ˆetre sp´ecifiques `a certaines bandes de fr´equences, une statistique de causalit´e dynamique et fr´equence sp´ecifique est essentielle. La causalit´e de Granger vient naturellement `a l’esprit comme outil statistique solide permettant de tester des relations causales entre deux signaux (Granger,1969).
Nous proposons tout d’abord dans le Chapitre 1 une revue des m´ethodes existantes permettant d’obtenir une statistique de causalit´e de Granger dynamique et fr´equence sp´ecifique et en discutons leurs avantages et inconv´enients.
Suite `a ce travail de revue, nous proposons dans le Chapitre 2 d’adopter une m´e- thodologie bas´ee sur la repr´esentation “state space” et le filtre de Kalman afin d’estimer un mod`ele lin´eaire Gaussien vecteur auroregressif avec des coefficients dynamiques par une m´ethode d’approximation dite Bayesienne variationnelle (Beal, 2003). Ce mod`ele Bayesian (BSS model) d’abord propos´e dansCassidy(2002), permet de d´eriver une statis- tique de causalit´e dynamique en estimant en mˆeme temps toutes les autres quantit´es n´ecessaires. La m´ethode d’approximation Bayesienne variationelle offre naturellement un crit`ere de s´election de mod`ele et permet d’inclure les connaissances a priori dans le mod`ele en ´evitant ainsi les probl`emes de surparam´etrisation.
Nous proposons d’´etendre le mod`ele propos´e `a la m´ethodologie Haar `a trous. Cette m´ethode de pr´ediction par ondelettes est bas´ee sur une d´ecomposition multiple du signal qui utilise la transform´ee par ondelettes redondantante`a trouset qui permet de mod´eliser les d´ependances `a court comme `a long terme entre les signaux. Cette extension du mod`ele
`
a la m´ethodologie `a trous permet de d´eriver une statistique de causalit´e de Granger qui soit `a la fois dynamique et fr´equence sp´ecifique.
Dans le Chapitre 3 nous proposons une application de la m´ethodologie propos´ee `a des donn´ees r´eelles d’´electroenc´ephalographie intracrˆaniennes enregistr´ees dans les r´egions de amygdale et du cortex orbitofrontal m´edian lors d’une tˆache exp´erimentale de reconnais- sance ´emotionnelle auditive. Nous pr´esentons dans ce chapitre les r´esultats pr´eliminaires de ce travail en cours.
Contents
Acknowledgements i
Abstract iii
R´esum´e v
Introduction 1
1 Time, Frequency & Time-Varying Causality Measures in Neuroscience 3
1.1 Introduction . . . 3
1.2 Stationarity . . . 4
1.3 Time Domain Causality . . . 4
1.3.1 Granger-causality criterion based on variances . . . 5
1.3.2 Granger-causality criterion based on coefficients . . . 6
1.3.3 Transfer entropy . . . 7
1.4 Frequency Domain Causality . . . 9
1.4.1 Geweke–Granger-causality statistic . . . 9
1.4.2 Directed transfer function and partial directed coherence . . . 11
1.5 Time-Varying Granger Causality . . . 12
1.5.1 Non-parametric statistics . . . 12
1.5.2 Time-varying VAR model . . . 13
1.6 Existing Toolboxes . . . 17
1.7 Discussion . . . 17
1.7.1 Limitations . . . 17
1.7.2 EEG and fMRI application . . . 18
1.7.3 Neuroscience data specificities . . . 18
1.7.4 Asymptotic distributions . . . 19
1.8 Conclusion . . . 19
2 Nonstationary Bayesian VAR Model 21 2.1 Introduction . . . 21
2.1.1 Existing methods and limits . . . 22
2.1.2 Neuroscience data specificities . . . 23
2.2 The Bayesian State Space Model . . . 23
2.2.1 Elements of variational Bayes . . . 24
2.2.2 The variational evidence lower bound . . . 28
2.2.3 Model specification . . . 29
2.2.4 Update equations . . . 32
2.2.5 Multiple trials . . . 35
2.3 The Multiscale Bayesian State Space Model . . . 37
2.3.1 The `a trous Haar wavelets transform . . . 38
2.3.2 The multiscale Bayesian state space model . . . 39
2.4 Bayesian Granger-Causality Statistic . . . 41
2.5 Assessment of Accuracy . . . 42
2.5.1 Practical implementation . . . 42
2.5.2 Model order selection . . . 42
2.5.3 Credible interval coverage . . . 45
2.5.4 Granger-causality detection . . . 47
2.5.5 Prediction interval coverage . . . 53
2.6 Conclusion . . . 57
3 Application 59 3.1 Introduction . . . 59
3.2 Results . . . 60
3.2.1 Testing the difference of causality between experimental conditions 61 3.2.2 Testing the scale specific causality for each experimental condition . 62 3.2.3 Residual analysis . . . 65
3.3 Discussion . . . 69
3.3.1 Statistical alternatives and outlook . . . 69
4 Some Aspects on Convergence 71 Conclusion 75 A Supplementary Material for Chapter 1 79 B Supplementary Material for Chapter 2 83 B.1 Matrix Algebra . . . 83
B.1.1 The trace operator . . . 83
B.1.2 The Kronecker operator . . . 83
B.1.3 The vectorization operator . . . 83
B.2 Densities and Divergences . . . 84
B.2.1 Normal distribution . . . 84
B.2.2 Inverse-gamma distribution . . . 84
B.2.3 Inverse-Wishart distribution . . . 85
B.3 Equivalence between Equations (2.12) and (2.13) . . . 85
B.4 Mean and Fluctuation Theorem . . . 86
B.5 Unified Inference Theorem . . . 86
B.6 E-step: Computation of the Distribution of ϕT1 . . . 87
B.7 M-step: Computation of the Distribution of δ . . . 88
B.8 M-step: Computation of the Distribution of α . . . 89
B.8.1 First case: α different for each element in A . . . 89
B.8.2 Second case: α unique . . . 90
B.9 M-step: Computation of the Distribution of A . . . 90
B.9.1 First case: A diagonal . . . 91
B.9.2 Second case: A full . . . 92
B.10 M-step: Computation of the Distribution of aq . . . 93
B.11 M-step: Computation of the Distribution of Q . . . 94
Contents ix
B.11.1 First case: Q proportional to identity . . . 94
B.11.2 Second case: Qdiagonal with k elements . . . 95
B.11.3 Third case: Q full . . . 96
B.12 M-step: Computation of the Distribution of ar . . . 98
B.13 M-step: Computation of the Distribution of R . . . 98
B.14 Computation of the Free Energy . . . 99
B.15 Granger-Causality Detection Results . . . 101
B.15.1 Data generated with slowly-varying parameters and normal errors . 101 B.15.2 Data generated with slowly-varying parameters and non-normal errors113 B.15.3 Windowing estimation procedure . . . 122
B.16 Prediction Interval Coverage Results . . . 129
B.16.1 Markovian parameters . . . 129
B.16.2 Slowly-varying parameters . . . 135
C Supplementary Material for Chapter 3 141 C.1 Preprocessing . . . 141
C.2 Contact Details . . . 142
C.3 Estimation Details for MSBSS Model . . . 143
C.4 Estimation Details for BSS Model . . . 143
C.5 Estimation Results for Patients S1,S2 and S4 . . . 144
C.6 Residual Analysis . . . 149 C.6.1 Residual analysis for patients S0, anger condition and channel AMY 149 C.6.2 Residual analysis for patients S0, neutral condition and channel AMY153
Glossary 171
Bibliography 173
A mon Amour, Arthur.`
Introduction
The core of this thesis consists of three chapters. The first two are self-contained with their own short introduction and discussion, with supplementary material in the appendix.
The first chapter proposes a systematic methodological review and objective criticism of existing methods enabling the derivation of time-varying Granger-causality statistics in neuroscience. The increasing interest and the huge number of publications related to this topic calls for this systematic review which describes the very complex methodological aspects. The capacity to describe the causal links between signals recorded at different brain locations during a neuroscience experiment is of primary interest for neuroscientists, who often have very precise prior hypotheses about the relationships between recorded brain signals that arise at a specific time and in a specific frequency band. The abil- ity to compute a time-varying frequency-specific causality statistic is therefore essential.
Two steps are necessary to achieve this: the first consists of finding a statistic that can be interpreted and that directly answers the question of interest. The second concerns the model that underlies the causality statistic and that has this time-frequency specific causality interpretation. In this chapter, we will review Granger-causality statistics with their spectral and time-varying extensions.
Based on this review, we present a new estimation and prediction methodology for multivariate nonstationary time series, with the aim of deriving a dynamical causal statis- tic that catches the nonstationary causal brain dynamics between two recorded signals.
Granger causality is a statistical tool capable of probing possible causal relationships be- tween two signals, its properties being assessed by Granger(1969). We propose to derive this statistic from a linear Gaussian vector autoregressive model with coefficients evolving in time according to a linear dynamical system. As this model contains many parameters, we propose to make use of the variational Bayes methods (Cassidy, 2002; Beal, 2003) in order to estimate this dynamical linear Gaussian vector autoregressive (VAR) model. This methodology allows us to estimate all the necessary quantities and to deal with datasets containing several trials (or epochs). Neuroscience data are recorded during an experi- ment and therefore several replications of the same experiment must be taken into account in a single estimate. The Bayesian nature of the model offers a natural criterion, the free energy, for model order selection, which is very important in time-series modelling. We propose an extension of this model via the `a trousmultiscale wavelet transform. This ap- proach has never been used in the context of time-varying VAR coefficient estimate, and we will show that it better captures short- and long-range dependencies between recorded signals while providing a very simple way to deal with time-frequency uncertainty bounds.
The resulting Bayesian dynamical Granger-causality statistic will then be derived from the time-varying estimated VAR coefficients.
The major contribution of this research is to provide a valid statistical methodology that can answer complex hypotheses linked to dynamical causality within a neuroscience framework. We builds on several previous works from engineering, statistics and neuro-
science.
Chapter 3 contains an application of the proposed methodology to intracranial elec- troencephalogram data. The seminal experimental framework of the study is the evalua- tion of the functional principles related to emotional signal processing. For this purpose, researchers investigated local field potentials (LFPs) in epileptic pharmaco-resistant pa- tients with recordings within the amygdala and medial orbito-frontal cortex in order to study the complex dynamics of neuronal processes within and between these regions in response to emotion exposure (Christen and Grandjean, 2010).
Based on the results in Grandjean et al. (2005), we propose to investigate to what extent brain oscillations recorded within the amygdala and the medial orbito-frontal cortex are causally related in the sense of Granger during specific emotional prosody exposures.
Preliminary results are presented for this work.
Finally, in Chapter 4, we give some arguments concerning the consistency of the variational Bayesian estimator proposed in Chapter 2 for a simpler model.
Chapter 1
Time, Frequency & Time-Varying Causality Measures in Neuroscience
1.1 Introduction
The investigation of the dynamical causal relationships between neuronal populations is a very important step towards the overall goal of understanding the links between func- tional cerebral aspects and their underlying brain mechanisms. This investigation requires statistical methods able to capture not only functional connectivities (e.g., symmetrical relationships), but also, and probably more importantly, effective connectivities (e.g., di- rectional or causal relationships) between brain activities recorded during a specific task or stimuli exposure.
The question of how to formalize and test causality is a fundamental and philosophical problem. A statistical answer, which relies on passing from causality to predictability, was provided in the 1960’s by the economist Clive Granger and is known as “Granger causality”. According to Granger (1969), if a signal X is causal for another signal Y in the Granger sense, then the history of X should contain information that helps to predictY above and beyond the information contained in the history of Y alone. It is the axiomatic imposition of a temporal ordering that allows us to interpret such dependence as causal: “The arrow of time imposes the structure necessary” (Granger, 1980, p. 139).
The presence of this relation between X and Y will be referred to “Granger causality”
throughout the text.
Granger(1969) adapted the definition of causality proposed by Wiener (1956) into a practical form and, since then, Granger causality has been widely used in economics and econometrics. It is however only during the past few years that it has become popular in neuroscience (see Pereda et al. (2005) and Bressler and Seth (2011) for a review of Granger causality applied to neural data).
Since its causal nature relies on prediction, Granger causality does not necessarily mean “true causality”. If the two studied processes are jointly driven by a third one, one might reject the null hypothesis of non-Granger causality between signals although manipulation of one of them would not change the other, which contradicts what “true causality” would have implied.
Granger causality may also produce misleading results when the true causal relation- ship involves more variables than those that have been selected and so the accuracy of its causal interpretation relies on a suitable preliminary variable selection procedure (Pearl, 2009).
If we concentrate on just two signals, the problem is twofold: the first part is the choice of a suitable causality statistic that can easily be interpreted and that answers the question of interest. This said, the statistic needs to rely on a model which intrinsically includes this prediction or Granger-causality principle, and so the second part of the problem is to define and properly estimate this fundamental statistical model. A wrong statistical model indeed may lead to a wrong causality inference.
The scope of this chapter is to review and describe existing Granger-causality statistics in the time and frequency domains and then to focus on their time-varying extensions. We will describe existing estimation methods for time-varying Granger-causality statistics, in order to give the reader a global overview and some insight on the pertinence of using a given method depending on the research question and the nature of the data.
In Sections 1.3 and 1.4, we will present time and frequency-domain Granger-causality statistics in the stationary case. In Section 1.5, we will discuss their time-varying exten- sions in terms of time-varying causal model estimation. In Section 1.6, we will outline existing toolboxes allowing us to derive time-varying frequency-specific Granger-causality statistics and then discuss the limitations and the potential application of these statistics in neuroscience in Section 1.7.
To our knowledge, there is no systematic methodological review and objective criticism of existing methods that lead to time-varying Granger-causality statistics. The increas- ing interest reflected by the number of publications related to this topic in neuroscience justifies this literature review undertaken from a statistical viewpoint.
1.2 Stationarity
Many Granger-causality models rely on the assumption that the system analyzed is covari- ance stationary. Covariance stationarity (also known as weak- or wide-sense stationarity) requires that the first moment and the covariance of the system do not vary with respect to time.
A random process Zt is covariance stationary if it satisfies the following restrictions on its mean function:
E[Z(t)] =mZ, ∀t ∈R, (1.1)
and on its autocovariance function:
E[(Z(t1)−mZ)(Z(t2)−mZ)] =CZ(t1, t2) =Cx(τ), where τ =t1−t2, ∀t1, t2 ∈R. (1.2) The first property implies that the mean function mZ is constant with respect to t.
The second property implies that the covariance function depends only on the difference between t1 and t2. The variance is consequently constant as well.
1.3 Time Domain Causality
We will first discuss the simplest case of Granger causality which is defined in the time domain. It is important to note that it requires that the data are stationary.
As mentioned in the introduction, Granger causality is based on prediction and its fundamental axiom is that “the past and present may cause the future but the future cannot cause the past” (Granger, 1969). The origin of Granger-no-causality was stated by Wiener in 1956 and then adapted and defined into practical form by Granger. As
1.3. Time Domain Causality 5 we will see, Granger restates Wiener’s principle in the context of autoregressive models (Granger,1969). In particular, the main idea lies in the fact that if a signalX is causal for another signal Y in the Granger sense, then past values ofX should contain information that helps to predict Y better than merely using the information contained in past values of Y (Granger,1969).
This concept of predicting better with an additional variable can be linked to sig- nificance tests in multiple linear regression, where an independent variable is declared significant if the full model explains (predicts) the dependent variable better than the model that does not contain this variable. In many fields these tests are called marginal and are linked to the so-called “Type III sum of squares” in ANOVA.
The general criterion of causality is: if the prediction error of a first series given its own past is significantly bigger than its prediction error given its own past plus the past of a second series, then this second series causes the first, in the Granger sense (Granger, 1980, 1969; Ding et al.,2006).
As Chamberlain (1982), Florens (2003) and Chicharro (2011) point out, the most general criterion of Granger non-causality can be defined based on the equivalence of two conditional densities:
ft(Yt|Yt−1t−p) =ft(Yt|Yt−1t−p, Xt−1t−p), (1.3) where Xt andYtare the two recorded time series, Yt−1t−p and Xt−1t−p denote the history from timet−1 tot−pofY andX respectively (i.e. [Yt−1, . . . , Yt−p], and [Xt−1, . . . , Xt−p]), and pis a suitable model order. This general criterion is expressed in terms of the distributions only, so it does not rely on any model assumptions (Kuersteiner,2008). Note that in this general definition, ft(.) can be different for each time, and therefore the general criterion in equation (1.3) includes nonstationary models.
Any existing method for assessing Granger causality can be viewed as a restricted estimation procedure allowing us to estimate the two densities in equation (1.3) and to derive a causality statistic in order to test their difference.
For linear Gaussian autoregressive models, the assumptions are Gaussianity, homoscedas- ticity and linearity, which implies stationarity in most cases. The quantities in equation (1.3) become an autoregressive model of order p (AR(p)) for the left-hand side:
ft(Yt|Yt−1t−p) = φ(Yt;µ=
p
X
j=1
ϑ1(j)Yt−j, σ2 = Σ1), (1.4) and a vector autoregressive model of order p (VAR(p)) for the right-hand side:
ft(Yt|Yt−1t−p, Xt−1t−p) =φ(Yt;µ=
p
X
j=1
ϑ11(j)Yt−j +
p
X
j=1
ϑ12(j)Xt−j, σ2 = Σ2), (1.5) where φ stands for the Gaussian probability density function.
In the next sections, we will present the two widely used approaches for testing hy- potheses (1.3) in the linear Gaussian context. The first one is based on an F statistic expressed as the ratio of the residual variances of models on equations (1.4) and (1.5) (Geweke, 1982). The second one is based on a Wald statistic and tests the significance of the causal VAR coefficients (Hamilton, 1994;L¨utkepohl, 2005).
1.3.1 Granger-causality criterion based on variances
The original formulation of Granger causality (Granger, 1969) is expressed in terms of comparing the innovation variances of the whole (equation (1.5)) and the restricted
(equation (1.4)) linear Gaussian autoregressive models (Geweke,1982; Ding et al., 2006).
Granger (1969) proposed the following quantity to quantify this variance comparison:
FX→Y = ln(Σ1
Σ2). (1.6)
In Hesse et al.(2003) and Goebel et al.(2003) this quantity is estimated by replacing the two variances by estimates. A test based on resampling this statistic is used for assessing the significance.
Geweke made several other important statements for (1.6) (Geweke,1984a,1982). He showed first that the total interdependence between two variables can be decomposed in terms of their two reciprocal causalities plus an instantaneous feedback term. Secondly, he showed that under fairly general conditions, FX→Y can be decomposed additively by frequency (see Section 1.4). Lastly, he pointed out that it is possible to extend Granger causality to include other series. Based on the conditional densities, the null hypothesis would write
ft(Yt|Yt−1t−p,Wt−pt−1) =ft(Yt|Yt−1t−p, Xt−1t−p,Wt−pt−1), (1.7) whereWt−pt−1 represents a set of variables that are controlled for when assessing the causal- ity from X to Y. In the literature, this extension bears the name conditional Granger causality (Ding et al., 2006).
As explained inBressler and Seth(2011) andGeweke(1982), comparing the innovation variances of the whole and restricted linear Gaussian autoregressive models amounts to evaluating the hypothesis
Ho: Σ1 = Σ2, (1.8)
which can be assessed through the statistic
F =
RSSr−RSSur m RSSur T −2m−1
. (1.9)
RSSr and RSSur are the residual sum of squares of the linear models in equations (1.4) and (1.5), needed to estimate Σ1 and Σ2 respectively, and T is the total number of observations used to estimate the unrestricted model.
This statistic follows approximately anF distribution with degrees of freedomm and T −2m−1. A significant F may reasonably be interpreted as an indication that the unrestricted model provides a better prediction than does the restricted one, and so that X causes Y in the Granger sense.
1.3.2 Granger-causality criterion based on coefficients
Another way to test for causality between two series under the same conditions as in Section 1.3.1 is to estimate model (1.5) only and to directly test the significance of the VAR coefficients of interest (Hamilton, 1994; L¨utkepohl, 2005). Let us first define the complementary equation of equation (1.5)
ft(Xt|Xt−1t−p, Yt−1t−p) =φ(Xt;µ=
p
X
j=1
ϑ22(j)Xt−j+
p
X
j=1
ϑ21(j)Yt−j, σ2 = Σ3), (1.10)
1.3. Time Domain Causality 7 and the variance-covariance matrix of the whole system
Σ= Σ2 Γ23 Γ23 Σ3
!
, (1.11)
where the off-diagonal elements may or may not be equal to zero. Testing whether X causes Y in the Granger sense amounts to testing the hypotheses
ϑ12(1) =ϑ12(2) =ϑ12(3)=· · ·=ϑ12(p) = 0, (1.12) and testing whether Y causes X in the Granger sense amounts to testing
ϑ21(1) =ϑ21(2) =ϑ21(3)=· · ·=ϑ21(p) = 0. (1.13) In the context of linear Gaussian autoregressive models, the two null hypotheses (1.8) and (1.12) are equivalent.
We can observe that the approach using hypothesis (1.8) requires the computation of two models (an AR model and a VAR model), whereas a single VAR model is sufficient for the approach using hypothesis (1.12).
Under joint normality and finite variance-covariance assumptions, the Wald statistic is defined as
W = (ϑˆ12)0var(ϑˆ12)−1(ϑˆ12), (1.14) whereϑ12contains all the parametersϑ12(j), forj = 1, . . . , p. AsT increases, this statistic asymptotically follows a χ2 distribution with p degrees of freedom (L¨utkepohl, 2005). A significant Wald statistic suggests that at least one of the causal coefficients is different from zero, and, in that sense, that X is causal for Y in the Granger sense. SeeSato et al.
(2006) for an example of application of this statistic in neuroscience.
The time-domain Granger-causality statistics in equations (1.9) and (1.14) are derived from AR and VAR modelling of the data. Their relevance therefore relies on the quality of the fitted models. The first issue is the selection of the model order p. Traditional criteria used in time series are the Akaike information criterion (Akaike, 1974) and the Bayesian information criterion (Schwarz, 1978). For the first statistic, in equation (1.9), it is advisable to select the same p for the two models. The second issue is probably often overlooked but of utmost importance. In practice, and particularly for neuroscience data, the plausibility of the assumptions behind these models must be checked before interpreting the resulting tests. This includes analysis of the residuals from the fitted model.
1.3.3 Transfer entropy
Transfer entropy (TE) is a functional statistic developed in information theory (Schreiber, 2000). It can be used to test the null hypothesis (1.3) in terms of the distributions themselves, and thus does not rely on the linear Gaussian assumption. It is defined as the Kullback–Leibler distance between the two distributionsf(Yt|Yt−1t−p) andf(Yt|Yt−1t−p, Xt−1t−p):
TX→Y =
Z
· · ·
Z
f(yt|yt−1t−p, xt−pt−1) lnf(yt|yt−1t−p, xt−pt−1)
f(yt|yt−1t−p) dytdyt−pt−1dxt−pt−1
=KLnf(yt|yt−pt−1)kf(yt|yt−pt−1, xt−pt−1)o,
(1.15)
where the integrals over yt−pt−1 andxt−pt−1 are both of dimensionp, and so the overall integral in equation (1.15) is of dimension {2p+ 1}.
An even more general definition would allow the distributionsf(.) to depend on time, letting the transfer-entropy statistic be time-dependent.
It has been shown that for stationary linear Gaussian autoregressive models (1.4) and (1.5), the indices (1.15) and (1.6) are equivalent (Barnett et al., 2009; Chicharro,2011).
In its general form, TE is a functional statistic, free from any parametric assumption on the two densities f(Yt|Yt−1t−p) and f(Yt|Yt−1t−p, Xt−1t−p). See for example Ch`avez et al.
(2003),Garofalo et al. (2009), Vicente et al. (2011), Wibral et al. (2011), Lizier et al.
(2011), Besserve and Martinerie (2011) and Besserve et al. (2010) for applications of TE in neuroscience. Difficulties arise when trying to estimate and compute the joint and marginal densities in equation (1.15). In principle, there are several ways to estimate these two quantities non-parametrically, but the performance of each strongly depends on the characteristics of the data. For a general review of non-parametric estimation methods in information theory see Vicente et al. (2011) and Hlav´aˇckov´a-Schindler et al.
(2007). For simple discrete processes, the probabilities can be determined by computing the frequencies of occurrence of different states. For continuous processes, which are those of interest for neuroscience, it is more delicate to find a reliable non-parametric density estimation. Kernel-based estimation is among the most popular methods; see for example Victor (2002), Kaiser and Schreiber (2002), Schreiber (2000) and Vicente et al. (2011). The major limitation of non-parametric estimation is due to the dimension and the related computational cost. In the present case, the estimation of f(Yt|Yt−1t−p) and f(Yt|Yt−1t−p, Xt−1t−p) presents two major limitations due to the curse of dimensionality induced by the model order p: a computational limitation, as it implies integration in dimension 2p+ 1 in equation (1.15), and the huge number of observations required to non-parametrically estimate the densities, as this number grows exponentially with the dimension. Typically,Schreiber (2000) proposes to choose the minimalp, meaningp= 1, for computational reasons (Schreiber, 2000, p.462).
A toolbox named TRENTOOL provides the computation of TE and the estimation of f(Yt|Yt−1t−p) and f(Yt|Yt−1t−p, Xt−1t−p) through kernel estimation (Lindner et al.,2011). This toolbox enables us to estimate a supplementary parameter, called the embedding delay (τ), which represents the lag in time between each observation of the past values of variables X and Y. Equation (1.15) then becomes
TX→Y =
Z
· · ·
Z
f(yt|yt−1τt−pτ, xt−pτt−1τ) lnf(yt|yt−1τt−pτ, xt−pτt−1τ)
f(yt|yt−pτt−1τ) dytdyt−pτt−1τdxt−pτt−1τ. (1.16) The model order p (called the embedding dimension in this context) is optimized simul- taneously with the embedding delay τ through two implemented criteria. The first is the “Cao criterion” (Cao, 1997), which selects τ on an “ad hoc” basis and p through a false neighbour criterion (Lindner et al., 2011). The second is the “Ragwitz criterion”
(Schreiber, 2000), which selects τ and p simultaneously by minimising the prediction er- ror of a local predictor. As discussed in Lindner et al. (2011), the choice of the order p and of the embedding delay τ is quite important. Indeed, if p is chosen too small, the causal structure may not be captured and thus the TE statistic will be incorrect. On the other hand, using an embedding dimension which is higher than necessary will lead to an increase of variability in the estimation, in addition to a considerable increase in computation time. Typically, Wibral et al. (2011) select the value of p as the maximum determined by the Cao criterion from p = 1 to 4, and choose the value of τ following a
1.4. Frequency Domain Causality 9 popular ad hoc option as the first zero of the autocorrelation function of the signal.
TRENTOOL allows us to compute the distribution of the transfer entropy statistic under the null hypothesis through a permutation method. The data are shuffled in order to break the links between the signals and then the transfer entropy statistic is recomputed on each surrogate dataset (e.g., Wibral et al., 2011, use 1.9×105 permutations for assessing the significance of the TE statistic). Analyses with TRENTOOL are limited so far to bivariate systems.
The formulation of causality based on the conditional independence in equation (1.3) was later used and theoretically refined in Chamberlain (1982) and Florens (2003). Al- though less general, the statistics given in equations (1.6) and (1.14) are much easier to implement and are testable. This probably explains why they have received considerably more attention in applied work.
1.4 Frequency Domain Causality
1.4.1 Geweke–Granger-causality statistic
As mentioned in Section 1.3.1, an important advance in developing the Granger-causality methodology was to provide a spectral decomposition of the time-domain statistics (Geweke, 1982, 1984b).
For completeness, we give below the mathematical details of this derivation. The Fourier transform of equations (1.5) and (1.10) for a given frequency ω (expressed as a system of equations) is
ϑ11(ω) ϑ12(ω) ϑ21(ω) ϑ22(ω)
! Y(ω) X(ω)
!
= ε1(ω) ε2(ω)
!
, (1.17)
where Y(ω) and X(ω) are the Fourier transforms of Y1T and X1T at frequency ω, and ε1(ω) andε2(ω) are the Fourier transforms of the errors of the models (1.5) and (1.10) at frequency ω. The components of the matrix are
ϑlm(ω) = δlm−
p
X
j=1
ϑlm(j)e(−i2πωj), where
( δlm = 0, l =m,
δlm = 1, l 6=m, , l, m= 1,2.
Rewriting equation (1.17) as Y(ω) X(ω)
!
= H11(ω) H12(ω) H21(ω) H22(ω)
! ε1(ω) ε2(ω)
!
, (1.18)
we have
H11(ω) H12(ω) H21(ω) H22(ω)
!
= ϑ11(ω) ϑ12(ω) ϑ21(ω) ϑ22(ω)
!−1
, (1.19)
where H is the transfer matrix. The spectral matrix S(ω) can now be derived as
S(ω) =H(ω)ΣH∗(ω), (1.20)
where the asterisk denotes matrix transposition and complex conjugation. Σis the matrix defined in equation (1.11) (Ding et al., 2006). The spectral matrix S(ω) contains cross- spectra terms (S12(ω), S21(ω)) and auto-spectra terms (S11(ω), S22(ω)). IfX and Y are independent, the cross-spectra terms are equal to zero.
Let us now write the auto-spectrum ofY as
S(ω)11=H(ω)11Σ2H∗(ω)11+ 2Γ23Re(H(ω)11)H∗(ω)12) +H(ω)12Σ3H∗(ω)12. (1.21) In the following derivation, we will suppose that Γ23, the off-diagonal element of the Σ matrix in equation (1.11), is equal to zero. In the case where this condition is not fulfilled, a more complex derivation is required (see Ding et al., 2006, for further details). If this independence condition is fulfilled, the auto-spectrum reduces to two terms,
S(ω)11 =H(ω)11Σ2H∗(ω)11+H(ω)12Σ3H∗(ω)12. (1.22) The first term, H(ω)11Σ2H∗(ω)11, only involves the variance of the signal of interest and thus can be viewed as the intrinsic part of the auto-spectrum. The second term H(ω)12Σ3H∗(ω)12 only involves the variance of the second signal and thus can be viewed as the causal part of the auto-spectrum.
In Geweke’s spectral formulation, the derivation of the spectral measurefX→Y requires the fulfillment of several properties. The measures have to be non-negative, and the sum over all frequencies of the spectral Granger-causality components has to equal the time- domain Granger-causality quantity (1.6):
1 2π
Zπ
−π
fX→Y(ω)dω =FX→Y. (1.23)
The two conditions together imply the desirable property
FX→Y = 0⇔fX→Y(ω) = 0, ∀ω. (1.24)
The third condition is that the spectral statistics have an empirical interpretation. The spectral Granger-causality statistic proposed by Geweke fulfills all three requirements.
For a given frequency ω and scalar variables X and Y, it is defined as fX→Y(ω) = S11(ω)
H11(ω)Σ2H11∗ (ω), (1.25) where Σ2 is the variance defined in equation (1.5), S11(ω) is the autospectrum of Y and H11(ω) is the (1,1) element of the transfer matrix in equation (1.19). The form of equation (1.25) provides an important interpretation: the causal influence depends on the relative size of the total power S11(ω) and the intrinsic power H11(ω)Σ2H11∗ (ω). Since the total power is the sum of the intrinsic and the causal powers (see equation (1.22)), the spec- tral Geweke–Granger-causality statistic is zero when the causal power is zero (i.e. when the intrinsic power equals the total power). The statistic increases as the causal power increases (Ding et al., 2006). Given the requirements imposed by Geweke, the measure fX→Y(ω) has a clear interpretation: it represents the portion of the power spectrum asso- ciated with the innovation process of model (1.5). However, this interpretation relies on the VAR model because the innovation process is only well-defined in this context (see Brovelli et al. (2004),Chen et al. (2009),Chen et al. (2006) andBressler et al.(2007) for examples of application in neuroscience).
The estimation of the parameters and the model order selection procedure is the same as in Section 1.3.2, because the frequency-domain VAR model in equation (1.17) is directly derived from the time-domain VAR model. The model order selection has to be
1.4. Frequency Domain Causality 11 performed within the time-domain model estimation procedure (Brovelli et al.,2004;Lin et al., 2009).
Lin et al. (2009) showed that under the null hypothesis fX→Y(ω) = 0 and based on (1.25), one can derive a statistic that follows an F distribution with degrees of freedom (p, T − 2p) when the number of observations tends to infinity (it was first derived in Brovelli et al., 2004; Gour´evitch et al., 2006).
1.4.2 Directed transfer function and partial directed coherence
The directed transfer function (DTF) and the partial directed coherence (PDC) are al- ternative measures also derived from VAR estimated quantities that are closely related to the Geweke–Granger-causality statistic.
The DTF is a frequency-domain measure of causal influence based on the elements of the transfer matrix H(ω) in equation (1.19). It has both normalized (Kami´nski et al., 2001) and non-normalized (Kami´nski, 2007) forms. The PDC (Baccal`a and Sameshima, 2001) is derived from the matrix of the Fourier-transformation of the estimated VAR coefficients in equation (1.17). It provides a test for non-zero coefficients of this matrix.
See Schelter et al.(2009) for a renormalized version of PDC andSchelter et al. (2006) for an example of application in neuroscience.
The DTF is expressed as
DTFX→Y(ω) =
v u u t
|H12(ω)|2
|H11(ω)|2+|H12(ω)|2, (1.26) where H12(ω) is the element (1,2) of the transfer matrix in equation (1.19). The PDC is defined as
PDCX→Y(ω) = ϑ12(ω)
ϑ∗2(ω)ϑ2(ω), (1.27)
whereϑ12(ω) represents the Fourier transformed VAR coefficient (i.e. the causal influence from X toY at frequency ω), andϑ2(ω) represents all outflows from X.
The PDC is normalized, but in a different way from the DTF. Indeed, the PDC represents the outflow from X toY, normalized by the total amount of outflows fromX.
The normalized DTF however represents the inflow fromX toY, normalized by the total amount of inflows to Y.
Comparisons between the Geweke–Granger-causality statistic, the DTF and the PDC are discussed in Eichler (2006),Baccal`a and Sameshima(2001),Gour´evitch et al.(2006), Pereda et al. (2005), Winterhalder et al. (2005), Winterhalder et al. (2006) and more recently in the context of information theory inChicharro(2011). As shown inChicharro (2011), the causal interpretation of the DTF and the GGC, at least in the bivariate case, relies on Granger’s definition of causality. For the PDC, a causal interpretation is different, as it relies on Sim’s definition of causality (Sims,1972). SeeChamberlain(1982) and Kuersteiner (2008) for a global overview and comparison of these two definitions of causality. Finally, Winterhalder et al. (2005) conducted a simulation-based comparison of the DTF and the PDC (and other statistics) in a neuroscience context.
Unlike the original time-domain formulation of Granger causality, statistical properties of these spectral measures have yet to be fully elucidated. For instance, the influence of signal pre-processing (e.g., smoothing, filtering) is not well established.
Assessment of significance
Theoretical distributions for DTF and PDC have been derived and are listed below. They are all based on the asymptotic normality of the estimated VAR coefficients. Therefore, they can be used and interpreted only if the assumptions behind this model hold. Schelter et al.(2006) showed that the PDC statistic asymptotically follows aχ2 distribution with 1 degree of freedom. Furthermore, Schelter et al.(2009) showed that a renormalized form of PDC can be related to aχ2 distribution with 2 degrees of freedom. Finally, Winterhalder et al. (2005) provide simulations that suggest that this χ2 distribution even works well if the true model order is strongly overestimated.
Eichler (2006) showed that the DTF quantity can be compared to a χ2 distribution with 1 degree of freedom. This property is also based on the asymptotic normality of estimated VAR coefficients and its accuracy is evaluated through simulations.
For the PDC as well as for the DTF asymptotic distributions, Schelter (2005) and Eichler (2006) state that a major drawback is that there are a lot of tests – one for each frequency. It is well known that when many tests are produced, caution has to be taken in interpreting those that are significant. For example, even under the null hypothesis of no information flow, there is a high probability that for a few frequencies the test will be significant.
1.5 Time-Varying Granger Causality
Neuroscience data are nonstationary in most cases. The specificity (task or stimulus related) of the increase or decrease and/or local field potential implies this nonstationarity which is of primary interest. A Granger-causality statistic in the time- or the frequency- domain is desirable as it would capture the evolution of Granger causality through time.
Since the original statistics are based on AR and VAR models, and therefore on as- sumptions assuming that the autocorelation does not vary along the time, these models have to be extended to cases assuming changing autocorelation structure in order to suitably extract a Granger-causality statistic.
Practically, getting a statistic to assess the causality between two series for each time requires the estimation of the densities ft(Yt|Yt−1t−p) and ft(Yt|Yt−1t−p, Xt−1t−p) separately for each timet. There are two additional difficulties to keep in mind. The first is the necessity of an objective criterion for time-varying model order selection and the second is the difficulty of incorporating all the recorded data (meaning all the trials) in the estimation procedure.
1.5.1 Non-parametric statistics
Wavelet-based statistic
In the context of neuroscience,Dhamala et al. (2008) proposed to bypass the nonstation- arity problem by non-parametrically estimating the quantities which allow us to derive the spectral Geweke–Granger-causality (GGC) statistic (1.25). They derived an evolutionary spectral density through the continuous wavelet transform of the data, and then derived a quantity related to the transfer function (by spectral matrix factorization). Based on this quantity, they obtain a GGC statistic that can be interpreted as a time-varying version of the GGC statistic defined in (1.25).
1.5. Time-Varying Granger Causality 13 This approach bypassed the delicate step of estimatingft(Yt|Yt−1t−p) andft(Yt|Yt−1t−p, Xt−1t−p) separately for each time. However this method presents several drawbacks in terms of interpretation of the resulting quantity. The GGC statistic is indeed derived from a VAR model and its interpretation directly follows from the causal nature of the VAR coefficients. The non-parametric wavelet spectral density however does not have this Granger-causality interpretation. Therefore attention must be paid when interpreting this proposed evolutionary causal GGC statistic derived from spectral quantities which are not based on a VAR model.
Local transfer entropy
Lizier et al.(2008,2011) and Prokopenko et al.(2013) proposed a time-varying version of the transfer entropy (1.15), in order to detect dynamical causal structure in a functional magnetic resonance imaging (fMRI) study context. The “global” transfer entropy defined in equation (1.15) can be expressed as a sum of “local transfer entropies” at each time:
TX→Y = 1 T
T
X
t=1
ft(yt|yt−pt−1, xt−pt−1) ln ft(yt|yt−1t−p, xt−pt−1)
ft(yt|yt−pt−1) , (1.28) where each summed quantity can be interpreted as a single “local transfer entropy”:
tx→y(t) = lnft(yt|yt−1t−p, xt−pt−1)
ft(yt|yt−1t−p) . (1.29) The step from equation (1.15) to equation (1.28) is obtained by replacing the joint density f(Yt, Yt−1t−p, Xt−1t−p) with its empirical version. This simplification seems difficult to justify in a neuroscience context, considering the continuous nature of the data. In fact, the sampling rate used in neuroscience data acquisition is often very high. As such, this local transfer entropy does not seem to be a suitable time-varying causality statistic for an application in neuroscience. Moreover, even if the overall quantity in equation (1.15) can be suitably expressed as a sum of orthogonal parts as in equation (1.29), its causal nature does not necessarily remain in each part. As such, we cannot directly interpret these parts as causal, even if the sum of them gives an overall quantity that has an intrinsic causal meaning. Finally, Prokopenko et al.(2013) orLizier et al.(2008, 2011) do not provide an objective criterion for model order selection.
1.5.2 Time-varying VAR model
As seen before in equations (1.9), (1.14), (1.25), (1.26) and in (1.27), parametric Granger- causality statistics in the time- and frequency-domains are derived from AR and VAR modelling of the data (equations (1.4) and (1.5) respectively). One way to extend these statistics to the nonstationary case amounts to allowing the AR and VAR parameters to evolve in time. In addition to the difficulties related to model order selection and the fact that we have to deal with several trials, time-varying AR and VAR models are difficult to estimate since the number of parameters is most of the time considerable compared to the available number of observations. To overcome the dimensionality of this problem, Chen (2005) propose to make one of the three following assumptions, local stationarity of the process (Dahlhaus, 1997), slowly-varying nonstationary characteristics (Priestley, 1965) and slowly varying parameters for nonstationary models (Ledolter,1980). In practice, it