Analysis of complex neural circuits with nonlinear multidimensional hidden state models

(1)

Analysis of complex neural circuits with

nonlinear multidimensional hidden state models

The MIT Faculty has made this article openly available.

Please share

how this access benefits you. Your story matters.

Citation

Friedman, Alexander et al. “Analysis of Complex Neural Circuits with

Nonlinear Multidimensional Hidden State Models.” Proceedings of

the National Academy of Sciences 113.23 (2016): 6538–6543. © 2016

National Academy of Sciences

As Published

http://dx.doi.org/10.1073/pnas.1606280113

Publisher

National Academy of Sciences (U.S.)

Version

Final published version

Citable link

http://hdl.handle.net/1721.1/106154

Terms of Use

Article is made available in accordance with the publisher's

policy and may be subject to US copyright law. Please refer to the

publisher's site for terms of use.

(2)

Analysis of complex neural circuits with nonlinear

multidimensional hidden state models

Alexander Friedmana,b,1, Joshua F. Slocuma,b,1, Danil Tyulmankova,b,1, Leif G. Gibba,b, Alex Altshulerc,d,e, Suthee Ruangwisesa,b, Qinru Shia,b, Sebastian E. Toro Aranaa,b, Dirk W. Becka,b, Jacquelyn E. C. Sholesf, and Ann M. Graybiela,b,2

a_{McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139;}b_{Department of Brain and Cognitive Sciences,} Massachusetts Institute of Technology, Cambridge, MA 02139;c_{Program on Crisis Leadership, Ash Center for Democratic Governance & Innovation, Kennedy} School of Government, Harvard University, Cambridge, MA 02138;d_{Department of Management, Faculty of Social Sciences, Bar-Ilan University, Ramat Gan,} 5290002, Israel;e_{Homeland Security Program, The Institute for National Security Studies, Tel Aviv, 6997556, Israel; and}f_{Department of Musicology and} Ethnomusicology, Boston University, Boston, MA 02215

Contributed by Ann M. Graybiel, April 21, 2016 (sent for review February 10, 2016; reviewed by Larry Abbott, Peter Dayan, and Terrence J. Sejnowski)

A universal need in understanding complex networks is the identification of individual information channels and their mutual interactions under different conditions. In neuroscience, our pre-mier example, networks made up of billions of nodes dynamically interact to bring about thought and action. Granger causality is a powerful tool for identifying linear interactions, but handling nonlinear interactions remains an unmet challenge. We present a nonlinear multidimensional hidden state (NMHS) approach that achieves interaction strength analysis and decoding of networks with nonlinear interactions by including latent state variables for each node in the network. We compare NMHS to Granger causality in analyzing neural circuit recordings and simulations, improvised music, and sociodemographic data. We conclude that NMHS signif-icantly extends the scope of analyses of multidimensional, nonlinear networks, notably in coping with the complexity of the brain.

causal analysis

|

functional connectivity

|

decoding

|

hidden Markov models

|

machine learning

I

n analyzing complex networks, there is a crucial need for tools to analyze interaction strength among nodes of the networks, extending from genetics to economics (1), demographics (2), and ecology (3). This need is newly pressing in neuroscience (4–6) (Fig. 1A). State-of-the-art recording techniques now allow si-multaneous measurement of the activity of hundreds of neurons (7, 8). Interpreting these recorded data requires the identification of groups of neurons that strongly interact with each other, known as neural microcircuits, as well as the ability to link microcircuit activity to behavior (7, 9). Currently, Granger causality (GC) (1) and cross-correlation (7) are leading methods for interaction strength analysis (5, 6, 10). Neither method can effectively analyze the non-linear interactions that are common in neural circuits and systems in other fields (1, 3–5).

Network decoding is an important need in many fields as well, and again, there is need for improved tools. Hidden Markov model (HMM)-based approaches are a leading method in the decoding field (11–15), but these can only decode a single ele-ment at a time. Linear dynamical systems (LDSs) are another approach based on latent variables and are an effective way to model and characterize the behavior of populations of neurons (16). However, LDS models are designed for analyzing data from large populations of similar neurons: they are less appropriate for analyzing circuits made up of heterogeneous neurons. Moreover, LDS cannot use known interactions between populations of neu-rons to improve decoding and prediction for both populations.

A further motivating point is that most analytic methods now used are either strictly tools for interaction strength analysis or tools for decoding (5, 7, 17). There is a significant advantage to combining the two processes, because identifying nodes that strongly interact with each other enables robust network decoding (13). Here, we introduce the nonlinear multidimensional hidden state (NMHS)

approach, which achieves both interaction strength analysis and decoding in networks with nonlinear interactions.

Results

The NMHS model is a generalization of the HMM (11) to systems encompassing two or more processes. For the purpose of analysis, a process may be any physical object or mathematical construct that produces a stream of data. Like the HMM, we model the activity of each process using hidden states. Each state describes a distribution of observed behaviors that account for the recorded activity (Fig. 1B). The model tracks the evolution of states over time. At each time-step, the next state of a process is determined stochastically. Our model diverges from a classical HMM in that the transition probability of a process can be affected by other processes. The state transitions of each process are governed by a multidimensional transition matrix (Fig. 1C andFig. S1A–C) that encodes the probability of transitioning to each possible state. Importantly, the probability of state transitions depends on both the current state of the process and the states of other processes in the network (Fig. 1D andFig. S1D).

To detect interactions, we construct an NMHS model that incorporates each process in a network or putative microcircuit, and we fit the model parameters to optimizePðDatajModelÞ. We

Significance

In analyzing complex networks, we are commonly interested in quantifying the influence that the network nodes exert on each other and in decoding the behavior of the network. We present the nonlinear multidimensional hidden state (NMHS) model, which addresses both of these unmet challenges by simultaneously decoding activity from parallel data streams and calculating the interaction strength among them. In NMHS models, each node in a network acts as a stochastic process that can influence the progression of other nodes in the net-work. We show that our procedure matches or outperforms state-of-the-art techniques in a multitude of scenarios, notably in systems with nonlinear interactions.

Author contributions: A.F. and A.M.G. designed research; A.F., J.F.S., D.T., and A.M.G. performed research; A.F., J.F.S., D.T., L.G.G., A.A., S.R., Q.S., S.E.T.A., D.W.B., J.E.C.S., and A.M.G. contributed new reagents/analytic tools; A.F., J.F.S., D.T., L.G.G., A.A., S.R., Q.S., S.E.T.A., D.W.B., J.E.C.S., and A.M.G. analyzed data; A.F., J.F.S., and A.M.G. wrote the paper; and A.M.G. oversaw the project.

Reviewers: L.A., Columbia University; P.D., University College London; and T.J.S., Salk Institute for Biological Studies.

The authors declare no conflict of interest.

Data deposition: The data reported in this paper have been deposited in the github repository

athttps://github.com/jfslocum/NMHS.

1_{A.F., J.F.S., and D.T. contributed equally to this work.}

2_{To whom correspondence should be addressed. Email: graybiel@mit.edu.}

This article contains supporting information online atwww.pnas.org/lookup/suppl/doi:10. 1073/pnas.1606280113/-/DCSupplemental.

(3)

then fit parameters to an HMM for each modeled process (Fig. 1E). If the NMHS model explains the data better than the en-semble of HMMs, we consider the processes in the network to be interacting with one another.

NMHS can estimate the directional interaction strength from one process to another (Fig. 1F). Given an NMHS model fitted to the data, the strength of a directional interaction is quantified by the degree to which state transitions of one process are found to be statistically dependent upon the state of the other process. Given two processes A and B, a change in A’s state may cause a change in B’s transition matrix: the interaction strength from A to B is the mean change in B’s transition matrix when A changes its state.

Network activity can be decoded by using the observations and the acquired model to infer the hidden state of each process at each point in time. Decoding provides insight into the behavior of the system by relating each process’s hidden state to recorded behavior. Such decoding is particularly valuable for applications in biological systems, given that a wide range of evidence has shown that hidden states underlie observed activity (12, 13, 18–20).

We designed NMHS for analyzing complex nonlinear interac-tions in neural circuits that cannot be analyzed by current meth-ods. NMHS models neurons as independent entities that may change their behavior based on the state of the network. Neurons are assigned their own emissions and states, but state transitions depend on the states of other neurons in the network. This in-terdependence matches the duality of neuronal interactions in the nervous system, where neurons act as individuals but modulate each other’s behavior (9, 21–23). We demonstrate the power of NMHS by analyzing electrophysiological recording data, simu-lated models, simusimu-lated neural circuits, and static data.

First, we applied NMHS to analyze neural microcircuits in three regions of the rat brain: the prefrontal cortex (PFC), the dorso-medial striatum (DMS), and the pars compacta of the dopamine-containing substantia nigra (SNpc) (Fig. 2). Microcircuits were identified experimentally by applying antidromic and orthodromic microstimulation during microelectrode recordings (Fig. 2A and B). We categorized putative two- and three-neuron microcircuits as

bidirectional or unidirectional based on responses to the micro-stimulation (24). Recordings of the identified circuits were made as rats performed a decision-making task (Fig. 2C). We applied NMHS, GC (25), and cross-correlation analyses to the recording data. We used two control datasets. In one, each neuron’s recorded activity was selected from a different session, and all recordings were aligned to task events. This procedure yielded sets of recordings that were synchronized to task structure, but that necessarily represented activity without interaction. In the second control dataset, time bins were randomly permuted (shuffled). For microcircuits consisting of two neurons, NMHS correctly detected all anatomically identified microcircuits and found no spurious interaction in control data. GC and cross-correlation performed comparably (Fig. 2D and Fig. S2D). In microcircuits of three neurons, NMHS again successfully identified all microcircuits. GC and cross-correlation spuriously identified a large number of interactions in the control group of neurons from different recording sessions, suggesting that they are not suitable for detecting interactions in spike-train data from cir-cuits of three neurons (Fig. 2E andFig. S2E).

We then compared the decoding capabilities of HMMs (11) and NMHS. HMMs have been shown to be a powerful tool for decoding neural states (12, 13). Recordings were made from neurons in rodent microcircuits in the PFC, DMS, and SNpc as the rats ran in a T-maze to receive one of two chocolate milk rewards at the end-arms (Fig. 2C). The activity of these three regions in decision-making tasks has been documented (24). SNpc neurons fire in anticipation of a reward (26). DMS neurons are active in the decision-making period. PFC neurons are activated after the turn and reach maximal activity when the rat consumes the reward (24). Recordings for each behavioral trial were divided into intervals based on behavioral events: the opening of the start gate, the turn, and the consumption of the reward. We selected circuits consisting of three neurons that we verified to be ana-tomically connected using microstimulation.

We used NMHS decoding to analyze the spike activity of the selected neural circuits. In the triplet shown in Fig. 2F, the SNpc neuron responded immediately to the opening of the start gate,

E

Process B Process A I II .7 .3 .8 .2 1 2 .8 .2 .9 .1 I 1 2 .5 .5 .6 .4 II

Network state diagram

High activity state Moderate interaction Strong interaction No interaction Low activity Burst activity High activity Burst activity State I P = 0.1 P = 0.2 P = 0.7 State II

Low activity High activity Burst activity

P = 0.7 P = 0.2 P = 0.1

Low activity High activity Burst activity

C

D

A

B

F

* L(A | B)= ∑ log P ( States A (t) | (States A(t-1) , States B(t-1)))t

1

**Optimization algorithm:

a) Baum-Welch b) Simulated annealing

Obs A Obs B Decoding function States A States B L(A | B)* _{TA EA} TB EB L(A | B) ; ; A to B Interactivity function B to A Optimization algorithm ** and model selection*** ***Model selection: a) Log-likelihood b) BIC T_B,_I

_≠

T_B,_II Total interactivity = ƒ( Transitions (B) | State (A))

II I

Observations from process A

State State Time t1 t2 t3 t4 II I 1 2 1 2 ... ...

Observations from process B

Fig. 1. Description of the NMHS model. (A) Schematic diagram of a neuronal microcircuit with an extraneous neuron. An ideal interaction detection al-gorithm would identify and characterize the interactions among all neurons in the microcircuit, and identify the extraneous neuron as unconnected. (B) Each state describes a probability distribution of observed outputs. (C) The transitions of a process depend on the state of its neighbors, such that process B will undergo transitions (light blue or light red) corresponding to the current state of process A (dark blue or dark red). (D) At a given time, t, the output of a process depends solely on the state of that process, and the transitions of that process depend on both its current state and the states of its neighbors. (E) Flowchart of the NMHS method. We search for a model that optimizes the likelihood of the data. We conclude that there is an interaction between A and B if the likelihood of our optimized NMHS model is greater than the likelihood of an unconnected model. By analyzing the transition and emission matrices, we can characterize the strength of the interactions and decode the behavior of A and B. (F) When the state of one process affects the transition behavior of another process, the two processes are interacting. The more the transitions of the process are affected, the stronger the interaction.

Friedman et al. PNAS | June 7, 2016 | vol. 113 | no. 23 | 6539

NEUROSCI

(4)

whereas the DMS neuron was in an active state from gate opening to turn and the PFC neuron was active in relation to reward de-livery. The decoded states of this triplet of neurons match the well-established relationship among the SNpc, DMS, and PFC in decision-making (24, 26). The hidden states decoded by the NMHS model matched the behavioral states of rats far better than those decoded by the HMM (Fig. 2F).

We further applied NMHS decoding to local field potential (LFP) activity recorded in the putamen, caudate nucleus and motor cortex of monkeys performing reaching tasks (27). NMHS decod-ing demonstrated region-specific LFP synchronization between the putamen and motor cortex, which are known to be anatomically connected (Fig. S2F). This match demonstrates NMHS’s ability to

decode continuous neuronal data.

As further validation of NMHS, we applied the algorithm to simulated models with known interaction strengths. We compared the simulated and estimated models using the NMHS interaction strength for sequences of data from 28 two-process models with varying interaction structures and strengths. NMHS successfully estimated the strength of almost all pairwise interactions in two-process models (Fig. 3A–B andTable S1). Additionally, NMHS successfully found the total interactivity (calculated as the maxi-mum interaction strength of all pairwise interactions within the model) in a three-node model (Fig. S3A).

To validate that our method successfully identifies interactions in neural microcircuits, we simulated five motifs of neuronal networks using the Hodgkin–Huxley model (28–30) (Table S2). We found a strong correlation between simulated synaptic strength and estimated interaction strength in nonlinear scenarios (Fig. 3E, F, I, and J andFig. S3B–F). As we expected (4, 5, 10), given that GC and cross-correlation are designed for linear teractions (1, 3), neither GC nor cross-correlation detected in-teractions with comparable accuracy (Fig. 3C, D, G, H, K, and L andFig. S3). Additionally, to demonstrate NMHS performance in a situation when there are unobserved nodes in the network, resulting in incomplete information, we simulated two scenarios: one with a varying number of unobserved neurons (Fig. S4A) and

another with a single unobserved neuron, varying the synaptic strength (Fig. S4B). As expected, with an increasing amount of unobserved information, the estimated interaction strength of the observed nodes decreases.

To expand our validation effort, we applied the NMHS algo-rithm to other networks of interacting processes. We chose an-other application in which independent agents interact and change behavior based on each other’s activity. We used recordings of coimprovising guitarists (Fig. 4A andFig. S5A) and applied NMHS to analyze guitar voices in a single band. Voices from separate bands were used as a control (Fig. 4B). The algorithm determined significantly greater interaction strength between voices in a single band than between unrelated voices (Fig. 4B andFig. S5C and D). The NMHS interactivity algorithm successfully identified the fact that all of the guitarists in a band listen to each other to varying degrees, which GC failed to detect (Fig. 4C). NMHS’s decoding ability also provided insight into the guitarists’ behavior; by esti-mating the hidden states of the guitar recordings, we were able to evaluate the improvisation strategy (mimicking or contrasting) of the guitarists (Fig. S5E). Furthermore, to validate NMHS perfor-mance in scenarios with incomplete information, we orchestrated recordings of three to six guitarists, analyzing the activity of only three musicians in the group. We found that NMHS performed equally well when up to half of the nodes were unobserved (Fig. 4D). Slightly modified, the NMHS algorithm also becomes a valuable tool for analysis of static data, such as those found in genomics, proteomics, and demographics (2). With static data, the algorithm measures the degree of interaction between observed variables. Because our algorithm looks at the dependence of the current state of the node on the previous state of itself as well as the neighbor, we modified it to instead consider the dependence of the current state of the node on the current state of the neighbor (but still the previous state of itself) (Fig. 4E). Applied to three sociodemographic datasets, NMHS determined interaction be-tween income and education level (Fig. 4F) that aligns closely with research evidence (2). Additionally, we simulated another model (Fig. 4G), where the output is produced by applying an exclusive

C

D

F

E

SNpc PFC DMS Record Stimulation Stimulation Stimulation Record Record 0 100 T rial number −200 0 100 Time (ms) 0 60 Firing rate (Hz) −200 0 100 Time (ms) Click Turn Reward Reward T rial number 1 20 Time (s) Decoded by HMM Decoded by NMHS 0 0.04 Granger causality Pair number 0 90 0 0.4 T o tal interactivity 0 90 Bidirectional Shuffled Unconnected Unidirectional 0.35 0 0 T o tal interactivity Bidirectional Shuffled Unconnected Unidirectional 45 0.018 0 Triplet number 0 45 Granger causality

A

B

4 Time (s) 0 Pair number Triplet number PFC DMS SNpc PFC DMS SNpc 4 0 0 0 4 4 40 40

Fig. 2. Verification of interactivity and decoding on neuronal microcircuits. (A) We first identified putative microcircuits: sets of two or three neurons such that the neurons are responders to stimulation of counterpart regions. (B) An example of a responding neuron, recorded in the DMS. In the raster plot (Left) and histogram (Right), spikes of the DMS neuron are aligned to stimulation of the PFC at time 0. (C) We recorded the activity of these putative microcircuits of neurons as a rat performed a decision-making task (Supporting Information). (D) We evaluated the ability of NMHS (Left) and GC (Right) to detect inter-actions in putative microcircuits. Microcircuits are categorized as bidirectional responders (both neurons respond to stimulus), unidirectional responders (one neuron responds to stimulus), unconnected (two recordings taken from different sessions, aligned to task events), or shuffled (recordings from connected circuits with observations shuffled over time). Cross-correlation analysis is shown inFig. S2D. (E) The same evaluation, on microcircuits of three neurons (see alsoFig. S2E). (F) NMHS is also a valuable tool for decoding neuronal microcircuits. In an illustration with task recordings, NMHS and HMM find neuron hidden states with high (red), middle (yellow), or low (blue) activity in the SNpc (Left), DMS (Center), and PFC (Right) during the task. In hidden state plots, trials (horizontally stacked) are aligned to trial start, and black squares indicate the turn onset.

(5)

or (XOR) rule to only a fraction of the sample bins. The sample bins where the XOR rule is not applied were random. We demonstrate that NMHS successfully detects strong total interactivity when the XOR rule is applied to a large fraction of sample bins (Fig. 4H and I). Discussion

We show here that NMHS compares very favorably to GC, cross-correlation, and conventional HMM solutions in multiple analysis and decoding challenges. NMHS demonstrates great promise for analyzing neural microcircuits in particular. We emphasize that NMHS carries the important advantage of being a tool for both analysis and decoding of complex networks.

The NMHS model that we introduce here is closely related to the hierarchical HMM (HHMM) (14) and the factorial HMM (FHMM) (15). All three are variants of the HMM that impose constraints on the structure of the HMM to improve its perfor-mance at a specific task. HHMMs are designed for analyzing and decoding a stream of data that exhibit long-term structure, such as language: the model accomplishes this analysis by organizing the states of the model into a tree in which state transitions between the leaves of a node are common, but in which transitions up and down the levels of the tree are comparatively rare. By contrast, FHMMs are designed for analyzing processes that may have multiple independent hidden states. The FHMM is similar to the NMHS model, except that where the NMHS has a multidimen-sional transition matrix, the FHMM has a multidimenmultidimen-sional emis-sion matrix: the likelihood of an emisemis-sion is conditioned on the value of each hidden state in the FHMM (SI Materials and Methods,

Comparison with Other Models). In contrast to the FHMM and HHMM, NMHS is designed to analyze multiple streams of data, each representing a process that interacts with another. This design allows interaction strength to be inferred from the parameters of the model, and concurrently using information from multiple interacting processes makes it possible to decode network activity much more reliably than with other methods.

There are a number of limitations to consider when applying NMHS. First, a dynamic system analyzed must be describable in

terms of states that make transitions over time. These states must be correlated across time; the recent past of a process should be an indicator of current activity.

The NMHS model is most suitable for analyzing interactions involving a small number of processes: the number of parameters in the model increases exponentially with the number of processes included in the model (SI Materials and Methods,Scalability of the NMHS Model), and thus estimating model parameters rapidly becomes computationally intractable as the number of processes increases. There are two potential approaches for increasing the maximum number of processes that can be modeled at once. First, we have successfully trained two- and three-process NMHS models using simulated annealing, an approach that can be applied to models with several processes. Second, approximation techniques such as variational inference can be used, as proposed by Ghahramani and Jordan (15). To analyze interactions within a collection of many processes, it is generally neither necessary nor desirable to use a single very large NMHS model. Larger NMHS models are only required for detecting complex interactions that cannot be characterized as a combination of multiple independent inter-actions involving fewer processes. For extremely large networks, we can group the network into smaller subnetworks within which nonlinear interactions are evident, and analyze each subnetwork using either the two- or three-process NMHS approach pre-sented here, or the extended NMHS approach described above. By comparison with NMHS, cross-correlation is only defined for two signals and is not applicable to analyzing interactions in-volving more than two processes. GC can be scaled to larger interactions (31) but can only detect linear interactions. We have shown here that using GC analysis of three-neuron recordings leads to a high rate of false positive (Fig. 2E).

Finally, when fitting parameters to the NMHS model, there may be multiple models that equally well describe the data (Fig. S1F–H). To determine whether this is the case, one must run parameter estimation with several different instantiations. If different models are found that account for the data equally well, interaction strength estimation and decoding cannot proceed. However, we have found

A

Simulated interactivity 0 0.8 Interaction strength 0.8 0 00 _{Simulated interactivity} 0.8

E

I

Neuron E Neuron I Neuron C Inhibition Excitation Neuron C Excitation Excitation Neuron E₂ Neuron E₁ Neuron 1 Neuron 2 A B X A B W A W X B W X 0 1 0 1 E_{1 gsyn} (mS/cm2₎ 0 1.5 0 1 0 1.5 0.035 Granger _causality 0 1 Granger causality 0 0.6 0 0.6 0 0.6 0 0.6 0 1 0 1 Interaction strength

B

C

F

G

J

K

0.015 0 0 _{Simulated interactivity} 0.8 Cross-correlation Direction 1→2 Direction 2→1

D

0 1 0 1.5 0 1 0 1 0 0.6 0 0.6 0 1 0 1

H

L

0 1 0 1 0 1 0 1 Granger causality Granger causality Granger causality Cross-correlation Cross-correlation Cross-correlation Cross-correlation

Direction 1→2 Direction 2→1 Direction 1→2 Direction 2→1

Interaction strength Interaction strength Interaction strength E_{2 gsyn} (mS/cm2₎ E_{1 gsyn} (mS/cm2₎ E_{2 gsyn} (mS/cm2₎ E_{1 gsyn} (mS/cm2₎ E_{2 gsyn} (mS/cm2₎ E gsyn (mS/cm2₎ I gsyn (mS/cm2₎ E gsyn (mS/cm2₎ I gsyn (mS/cm2₎ E gsyn (mS/cm2₎ I gsyn (mS/cm2₎ Fig. 3. Validation on simulated microcircuits. (A–D) Simulation data were generated from two-node, two-state, three-emission NMHS matrix models (A). Interactivity estimated using NMHS (B), GC (C), and cross-correlation (D) was plotted against the simulated interactivity. (Blue points represent how much node 2 is affected by node 1; red vice versa.) (E–H) A Hodgkin–Huxley model simulates a triplet of neurons with two excitatory synapses (E). The strength of each interaction in the triplet is estimated using NMHS (F), GC (G), and cross-correlation (H) and plotted against the simulated synaptic strength (gsyn). The remaining interactivity directions are shown inFig. S3E. (I–L) Interactivity of a similar Hodgkin–Huxley model with one excitatory and one inhibitory synapse (I), estimated with NMHS (J), GC (K), and cross-correlation (L) (see alsoFig. S3F).

NEUROSCI

(6)

that total interactivity detection is robust to this issue (Fig. S1F and H). Although the numerical value may be slightly higher or lower than that of the true value, it is unlikely that a noninteracting model can account for the data from an interacting pair or triplet of nodes as well as an interacting model can (Fig. S1F and H).

Materials and Methods

All animal procedures were approved by the Committee on Animal Care at the Massachusetts Institute of Technology.

Model. The NMHS model is an extension of the HMM (5, 11) that is capable of learning the behavior of multiple interacting processes simultaneously. The NMHS model is related to a number of other HMM-based models that have been proposed for solving other problems (14, 15). When modeling a single process, NMHS is identical to a HMM. Like HMMs, a NMHS model can be trained through unsupervised learning—it requires only a sufficiently long stream of data for each process to be modeled. NMHS assumes that the processes being modeled are stateful: that is, the behavior at a time t can be predicted using the behavior at time t− 1. States are represented as distribu-tions of possible behaviors: an emission matrix,E, describes the likelihood of observing a given emission e given the model is currently in state s:Ee,s= PðejsÞ. The model automatically learns the best states to account for the data. Tran-sitions between states are treated as stochastic events: the probability of a process making a transition to a new state is based on its current state and may be affected by other processes’ states. If i and j are the states of two interacting processesX, Y at time t, then the probability of process X making a transition to state k at t + 1 is given by Tijk= PðXt+1= kjXt= i, Yt= jÞ. When modeling multiple processes, each process has its own transition and emission matrix.

To account for the idea that the transition probabilities of a process can be affected by other processes, we add dimensions to the transition matrix. Con-sider the case of modeling the interactions between two processes: the addi-tional dimension in the transition matrix can be thought of as containing a “repertoire of behaviors”: each slice of the matrix along this dimension is an individual transition matrix that corresponds to one state of the neighboring process. Each of these transition matrices can describe a completely different behavior for the process. To illustrate this point, considerFig. S1A. We can think of two processes: A with states 1 and 2 and B with states I and II. If process A exerts a strong influence on the behavior of process B, then layers 1 and 2 in B’s transition matrix will be very different, as is the case inFig. S1A. Conversely, if B does not much influence A’s behavior, then layers I and II in A’s transition matrix will be quite similar to each other. Adding process C to our system also adds an additional dimension: now the behavior of each process is dependent

on the states of both other processes. For each added process, an additional dimension and corresponding sets of layers are added. For example, the two processes inFig. S1Aare modeled with two states: the total size of the T matrix therefore is 2× 2 × 2 = 8 elements. Were a third process added to the system, the number of elements in each process’s T-matrix would be 2 × 2 × 2 × 2 = 16. The NMHS model can be expressed as a single HMM whose states and emissions are drawn from the product space of all of the process’s states and the product space of all of the process’s emissions. Compared with this gestalt HMM, the NMHS model enforces that a process’s emissions are independent from other process’s states and a process’s transitions are independent of other process’s transitions. The NMHS model is also more interpretable and requires fewer parameters (SI Materials and Methods,Comparison with Other Models). It is easy to understand directly from the model that a given layer of a tran-sition matrix corresponds to the trantran-sition behavior, given the state of the neighbor. Furthermore, each process has its own emission matrix that describes its behavior independently from other processes.

Goal Function and Interactivity Detection. We evaluate the fit of a model by calculating the likelihood function of the observations given the model parameters. For a two-process NMHS model with observation sequences π and ϕ, transition matrices TikjdPðXt+1= kjXt= i, Yt= jÞ and RjlidPðYt+1= ljYt= j, Xt= iÞ, and emission matrices EiðpÞdPðπt= pjXt= iÞ and GjðqÞd Pðϕ = qjYt= jÞ, where M = fT, R, E, Gg, we have

fijðtÞd Pðπ1: t,ϕ1: t, Xt= i, Yt= jjMÞ= EiðπtÞ * GjðϕtÞ * X k X l Tkil* Rljk* fklðt − 1Þ Pmodel= Pðπ, ϕjMÞ = X i X j fijðτÞ and Bayes theorem gives us PðMjπ, ϕÞ =Pmodel*PðMÞ

Pðπ, ϕÞ .

For different parameters under the same model, we assume PðMÞ is the same, and we thus can compare the posterior probabilities of different pa-rameterizations to determine which is better. We treat a set of one-process HMMs, one for each process being modeled, as the null hypothesis. For the two-process case, this gives us two HMMs with parametersN and forward probabilities f1(t) and f2(t). We then have P∅= Pðπ, ϕjNÞ =P

i

f1iðτÞ +P j

f2jðτÞ. Because PðMÞ != PðNÞ, we cannot directly compare Pmodeland P∅. How-ever, if we make the assumption that PðNÞ is the same for all models, then we can estimate a scaling factorσ = PðNÞ=PðMÞ, and then compare Pmodel> σ * P∅ to determine how good a fit the model is versus the null hypothesis.

To estimateσ, we train a number of interacting and noninteracting models on data known to represent an interaction, and take the mean difference

B

C

E

A

Follower Follower Leader Classifier 1 Interaction strength 0 0 1 n = 6 n = 6 n = 6 Co-improvised Shuffled Sep. improvised T o tal interactivity Follower 1 → Leader Follower 2 → Leader Leader → Follower 1 Leader → Follower 2 Follower 2 → Follower 1 Follower 1 → Follower 2 B A 1 2 P(2|1) P (B|2) P(B|A) P (1|A)

High income Low income

High education Low education

P (A|1) P (2|B)

F

1 0 T o tal interactivity

Income → Education Education → Income

Granger causality 1 0 T o tal interactivity 0 1

D

0_{Probability of XOR}1 rule application 0 1 Probabilistic XOR Shuffled

G

Input “A” Input “B” XOR rule or random number 1 1 0 0 0 0 1 0 0 1 1 1 Bin 1 Bin n 1 0 1 0 0 0 1 1 1 Bin 2

...

I

0_{Probability of XOR}1 rule application 0 1 Cross-correlation

H

3 recodred 3 in band

3 recodred 3 in band 3 recodred 4 in band 3 recodred 5 in band 3 recodred 6 in band

T

o

tal interactivity

Fig. 4. Application to musical improvisation recordings and static data. (A) We collected recordings, in separate channels, of three guitarists improvising classic rock music. (B) NMHS determined interaction between three voices of a single band (coimprovised), three voices from separate bands (separately improvised), and randomly permuted recordings of three voices from a single band (shuffled). (C) NMHS attributed different interactivity from improvisation leaders to followers, from followers to leaders, and between followers. (D) Interaction strength detected by NMHS in the analysis of the activity of only three guitarists out of a larger group of three to six musicians. (E) In an application on a static network, NMHS analyzed sociodemographic data. In this static model, individuals in strata (boxes) are assigned states (blue and red ovals) that describe their wealth and education. (F) NMHS found interactivity between soci-odemographic strata. (G) A probabilistic XOR model, where the output is produced by applying an XOR rule to a bin in the input streams with a set probability P. If the XOR rule is not applied, the output value is selected randomly. (H) NMHS performance in detecting interactivity strength between the input streams as a function of increasing P. (I) Cross-correlation performance in detecting the same interaction.

(7)

between the interacting and noninteracting likelihoods. In practice, we have found thatσ = 1.1 works well in most instances. We have also achieved good results by using the Bayesian information criteria (BIC) (32), which has the advantage that it does not require estimation of an additional parameter.

An alternative method of choosing an appropriateσ is by plotting a receiver operating characteristic curve (Fig. S1E). Using a simulated model that is of similar size and structure to the dataset of interest, we select theσ that results in the highest probability of detection for a given probability of false positive. This method will work whether the goal function is the log-likelihood of the model, the BIC, or another appropriate goal function. An important note, however, is that the threshold value depends on the model itself so it must be calculated for each new model. For example, the optimalσ when comparing the BIC values of one- and three-node models wasσ = 1.023 (Fig. S1F). Confidence in the Model. In some instances, there can be multiple models that all account for the data well and meet the Pmodel> σ * P∅criterion. There are two possible issues that can lead to this situation. The first is insufficient data: with a small dataset, the NMHS model can be over-fit to the obser-vations, and several distinct models might be able to account for the data. The second possible issue is that distinct models sometimes produce equiv-alent behavior and thus are in practice indistinguishable. An example is the “common cause” scenario, in which process A affects both process B and process C: a model wherein process A affects process B which in turn affects process C may produce equivalent behavior. In either instance, when mul-tiple interacting models are found (Fig. S1 G and H), it generally is not possible to quantify the interaction strength, or to decode the behavior of the processes. However, the total interactivity, calculated as the maximum of all of the pairwise interaction strengths in the model, is relatively robust to this issue. This phenomenon occurs because given a dataset that comes from a network with a high total interaction strength (e.g.,> 0.5 inFig. S1 F–H), it is unlikely for three independent one-node models to explain it better than a single three-node model, even if that three-node model does not find the specific pairwise interactions correctly. Conversely, it is unlikely for a three-node model to explain a dataset that comes from three independent data streams better than three one-node models. Thus, with an appropriate es-timate forσ, the models with low total interaction are better explained by three one-node models and are therefore labeled as noninteracting; the high total interaction models are better explained by a single three-node model and therefore are labeled as interacting (Fig. S1G).

Determining the Multiple-Explanation Models. To determine whether the multiple-explanation issue is present for a given dataset, we train several

different models on that dataset, initializing the training algorithm with a different set of parameters each time. We perform enough initializations to sample the search space uniformly and densely. If a global-optimum ap-proach is used to fit the model parameters, such as genetic algorithms or simulated annealing, it is necessary to keep track of all of the local optima encountered in the search and to compare the models that have log-like-lihood above a predefined threshold. If we find that all of the trained models are similar to each other, it is likely that there is only one model that explains the data.

Quantifying Interaction Strength. Once NMHS establishes that Pmodel σ * P∅ and that only one model explains the data well, we can quantify the in-teraction strength by analyzing the transition matrices. We calculate an interactivity index that signifies how strongly one process affects another.

Given processesA and B with N states for process A, and M states for process B, we calculate the interaction strength from B to A (i.e., how stronglyA is affected by B) as follows. Let A have the transition matrix TikjdPðXt+1= kjXt= i, Yt= jÞ. We first calculate the “specific interaction in-dex” for each state z in A, given each pair of states x,y in B:

Cz xy=

PN

i=1Tzix− Tziy

2 .

Given thatT satisfies the Markov property, these values are bounded by the interval [0,1]. To measure pairwise interaction strength fromB to A, we take the mean of all specific interaction indices:

CAB= PM x=1PMy=x+1PNz=1Cxyz N *ðM 2Þ .

This value is on the interval [0,1], where 0 corresponds to no interaction (A andB are entirely independent of each other), and 1 corresponds to maxi-mal interaction. In the NMHS Toolbox, this calculation can be performed for all pairs of nodes simultaneously in two- or three-process models using the interactivity() function. To calculate total interactivity, simply take the maximum interaction strength among all pairs of nodes.

ACKNOWLEDGMENTS. We thank Daniel Gibson and Yasuo Kubota for their help in many aspects of this work. This work was supported by National Institutes of Health Grant R01 MH060379, the Defense Advanced Research Project Agency and US Army Research Office Grant W911NF-10-1-0059, and the Saks Kavanaugh Foundation.

1. Granger CWJ (1969) Investigating causal relations by econometric models and cross-spectral methods. Econometrica 37(3):424–438.

2. Putnam RD (2015) Our Kids: The American Dream in Crisis (Simon and Schuster, New York). 3. Sugihara G, et al. (2012) Detecting causality in complex ecosystems. Science 338(6106):

496–500.

4. Kispersky T, Gutierrez GJ, Marder E (2011) Functional connectivity in a rhythmic in-hibitory circuit using Granger causality. Neural Syst Circuits 1(1):9.

5. Oweiss KG (2010) Statistical Signal Processing for Neuroscience and Neurotechnology (Academic, Cambridge, MA).

6. Schneidman E, Berry MJ, 2nd, Segev R, Bialek W (2006) Weak pairwise correlations imply strongly correlated network states in a neural population. Nature 440(7087):1007–1012. 7. Brown EN, Kass RE, Mitra PP (2004) Multiple neural spike train data analysis:

State-of-the-art and future challenges. Nat Neurosci 7(5):456–461.

8. Stevenson IH, Kording KP (2011) How advances in neural recording affect data analysis. Nat Neurosci 14(2):139–142.

9. Schneidman E, Bialek W, Berry MJ, 2nd (2003) Synergy, redundancy, and independence in population codes. J Neurosci 23(37):11539–11553.

10. Cadotte AJ, DeMarse TB, He P, Ding M (2008) Causal measures of structure and plasticity in simulated and living neural networks. PLoS One 3(10):e3355. 11. Rabiner LR (1989) A tutorial on hidden Markov-models and selected applications in

speech recognition. Proc IEEE 77(2):257–286.

12. Abeles M, et al. (2013) Compositionality in neural control: An interdisciplinary study of scribbling movements in primates. Front Comput Neurosci 7:103.

13. Gat I, Tishby N, Abeles M (1997) Hidden Markov modelling of simultaneously recorded cells in the associative cortex of behaving monkeys. Network-Comp Neural 8(3):297–322. 14. Fine S, Singer Y, Tishby N (1998) The hierarchical hidden Markov model: Analysis and

applications. Mach Learn 32(1):41–62.

15. Ghahramani Z, Jordan MI (1997) Factorial hidden Markov models. Mach Learn 29(2-3): 245–273.

16. Buesing L, Macke JH, Sahani M (2012) Learning stable, regularised latent models of neural population dynamics. Network 23(1-2):24–47.

17. Quian Quiroga R, Panzeri S (2009) Extracting information from neuronal populations: Information theory and decoding approaches. Nat Rev Neurosci 10(3):173–185. 18. Burak Y, Rokni U, Meister M, Sompolinsky H (2010) Bayesian model of dynamic image

stabilization in the visual system. Proc Natl Acad Sci USA 107(45):19525–19530.

19. Pinto DJ, Jones SR, Kaper TJ, Kopell N (2003) Analysis of state-dependent transitions in frequency and long-distance coordination in a model oscillatory cortical circuit. J Comput Neurosci 15(2):283–298.

20. Gläscher J, Daw N, Dayan P, O’Doherty JP (2010) States versus rewards: Dissociable neural prediction error signals underlying model-based and model-free re-inforcement learning. Neuron 66(4):585–595.

21. Stern M, Sompolinsky H, Abbott LF (2014) Dynamics of random neural networks with bistable units. Phys Rev E Stat Nonlin Soft Matter Phys 90(6):062710.

22. Sompolinsky H (2014) Computational neuroscience: Beyond the local circuit. Curr Opin Neurobiol 25:xiii–xviii.

23. Gutierrez GJ, Marder E (2014) Modulation of a single neuron has state-dependent actions on circuit dynamics. eNeuro 1(1):ENEURO.0009-14.2014.

24. Friedman A, et al. (2015) A corticostriatal path targeting striosomes controls decision-making under conflict. Cell 161(6):1320–1333.

25. Seth AK (2010) A MATLAB toolbox for Granger causal connectivity analysis. J Neurosci Methods 186(2):262–273.

26. Schultz W (1998) Predictive reward signal of dopamine neurons. J Neurophysiol 80(1):1–27. 27. Feingold J, Gibson DJ, DePasquale B, Graybiel AM (2015) Bursts of beta oscillation differentiate postperformance activity in the striatum and motor cortex of monkeys performing movement tasks. Proc Natl Acad Sci USA 112(44):13687–13692. 28. Destexhe A, Mainen ZF, Sejnowski TJ (1994) Synthesis of models for excitable

mem-branes, synaptic transmission and neuromodulation using a common kinetic for-malism. J Comput Neurosci 1(3):195–230.

29. Destexhe A, Sejnowski TJ (2001) Thalamocortical Assemblies: How Ion Channels, Single Neurons and Large-Scale Networks Organize Sleep Oscillations (Oxford Univ Press, New York). 30. Hodgkin AL, Huxley AF (1952) A quantitative description of membrane current and its

application to conduction and excitation in nerve. J Physiol 117(4):500–544. 31. Ding M, Chen Y, Bressler SL (2006) Granger causality: Basic theory and application to

neuro-science. Handbook of Time Series Analysis: Recent Theoretical Developments and Applications, eds Schelter B, Winterhalder M, Timmer J (Wiley, Wienheim, Germany), pp 437–495. 32. Siddiqi SM, Gordon GJ, Moore AW (2007) Fast state discovery for HMM model

se-lection and learning. JMLR Workshop Conf Proc 2:492–499.

33. Friedman A, Keselman MD, Gibb LG, Graybiel AM (2015) A multistage mathematical approach to automated clustering of high-dimensional noisy data. Proc Natl Acad Sci USA 112(14):4477–4482.

NEUROSCI