• Aucun résultat trouvé

Une méthode de machine à état liquide pour la classification de séries temporelles : A new liquid state machine method for temporal classification

N/A
N/A
Protected

Academic year: 2021

Partager "Une méthode de machine à état liquide pour la classification de séries temporelles : A new liquid state machine method for temporal classification"

Copied!
141
0
0

Texte intégral

(1)

FRANC¸ OIS RH´EAUME

UNE M´

ETHODE DE MACHINE `

A ´

ETAT

LIQUIDE POUR LA CLASSIFICATION DE

ERIES TEMPORELLES

A new liquid state machine method for temporal classification

Th`ese pr´esent´ee

`a la Facult´e des ´etudes sup´erieures et postdoctorales de l’Universit´e Laval dans le cadre du programme de doctorat en g´enie ´electrique

pour l’obtention du grade de Philosophiæ doctor (Ph.D.)

D´EPARTEMENT DE G´ENIE ´ELECTRIQUE ET DE G´ENIE INFORMATIQUE FACULT´E DES SCIENCES ET DE G´ENIE

UNIVERSIT´E LAVAL QU´EBEC

2012

c

(2)

L’int´erˆet envers la neuroscience informatique pour les applications d’intelligence artifi-cielle est motiv´e par plusieurs raisons. Parmi elles se retrouve la rapidit´e avec laque-lle le domaine ´evolue, promettant de nouvelaque-lles capacit´es pour l’ing´enieur. Dans cette th`ese, une m´ethode exploitant les r´ecents avancements en neuroscience informatique est pr´esent´ee: la machine `a ´etat liquide (“liquid state machine”). Une machine `a ´etat liq-uide est un mod`ele de calcul de donn´ees inspir´e de la biologie qui permet l’apprentissage sur des flux de donn´ees. Le mod`ele repr´esente un outil prometteur de reconnaissance de formes temporelles. D´ej`a, il a d´emontr´e de bons r´esultats dans plusieurs applica-tions. En particulier, la reconnaissance de formes temporelles est un probl`eme d’int´erˆet dans les applications militaires de surveillance telle que la reconnaissance automatique de cibles. Jusqu’`a maintenant, la plupart des machines `a ´etat liquide cr´e´ees pour des probl`emes de reconnaissance de formes sont demeur´ees semblables au mod`ele origi-nal. D’un point de vue ing´enierie, une question se d´egage: comment les machines `a ´etat liquide peuvent-elles ˆetre adapt´ees pour am´eliorer leur aptitude `a solutionner des probl`emes de reconnaissance de formes temporelles ? Des solutions sont propos´ees. La premi`ere solution sugg´er´ee se concentre sur l’´echantillonnage de l’´etat du liquide. `A ce sujet, une m´ethode qui exploite les composantes fr´equentielles du potentiel sur les neu-rones est d´efinie. La combinaison de diff´erents types de vecteurs d’´etat du liquide est aussi discut´ee. Deuxi`emement, une m´ethode pour entraˆıner le liquide est d´evelopp´ee. La m´ethode utilise la plasticit´e synaptique `a modulation temporelle relative pour mod-eler le liquide. Une nouvelle approche conditionn´ee par classe de donn´ees est propos´ee, o`u diff´erents r´eseaux de neurones sont entraˆın´es exclusivement sur des classes parti-culi`eres de donn´ees. Concernant cette nouvelle approche ainsi que celle concernant l’´echantillonnage du liquide, des tests comparatifs ont ´et´e effectu´es avec l’aide de jeux de donn´ees simul´ees et r´eelles. Les tests permettent de constater que les m´ethodes pr´esent´ees surpassent les m´ethodes conventionnelles de machine `a ´etat liquide en ter-mes de taux de reconnaissance. Les r´esultats sont encore plus encourageants par le fait qu’ils ont ´et´e obtenus sans l’optimisation de plusieurs param`etres internes pour les diff´erents jeux de donn´ees test´es. Finalement, des m´etriques de l’´etat du liquide ont ´et´e investigu´ees pour la pr´ediction de la performance d’une machine `a ´etat liquide.

(3)

Abstract

There are a number of reasons that motivate the interest in computational neuro-science for engineering applications of artificial intelligence. Among them is the speed at which the domain is growing and evolving, promising further capabilities for artificial intelligent systems. In this thesis, a method that exploits the recent advances in com-putational neuroscience is presented: the liquid state machine. A liquid state machine is a biologically inspired computational model that aims at learning on input stimuli. The model constitutes a promising temporal pattern recognition tool and has shown to perform very well in many applications. In particular, temporal pattern recognition is a problem of interest in military surveillance applications such as automatic target recognition. Until now, most of the liquid state machine implementations for spatiotem-poral pattern recognition have remained fairly similar to the original model. From an engineering perspective, a challenge is to adapt liquid state machines to increase their ability for solving practical temporal pattern recognition problems. Solutions are pro-posed. The first one concentrates on the sampling of the liquid state. In this subject, a method that exploits frequency features of neurons is defined. The combination of different liquid state vectors is also discussed. Secondly, a method for training the liq-uid is developed. The method implements synaptic spike-timing dependent plasticity to shape the liquid. A new class-conditional approach is proposed, where different net-works of neurons are trained exclusively on particular classes of input data. For the suggested liquid sampling methods and the liquid training method, comparative tests were conducted with both simulated and real data sets from different application areas. The tests reveal that the methods outperform the conventional liquid state machine ap-proach. The methods are even more promising in that the results are obtained without optimization of many internal parameters for the different data sets. Finally, measures of the liquid state are investigated for predicting the performance of the liquid state machine.

(4)

En 1997, j’ai d´ebut´e un baccalaur´eat en g´enie informatique `a l’Universit´e Laval. `A l’´et´e 1999, j’ai obtenu un stage au Laboratoire de Radiocommunications et Traitement du Signal. J’´etais alors sous la supervision du professeur Dominic Grenier qui, `a l’image de plusieurs professeurs au d´epartement de g´enie ´electrique et de g´enie informatique, repr´esentait pour moi un mod`ele d’excellence dans sa profession. `A l’hiver 2001, j’ai d´ebut´e une maˆıtrise en g´enie ´electrique. En 2002, j’ai ´et´e embauch´e `a Recherche et d´eveloppement pour la d´efense Canada - Valcartier `a titre de scientifique de la d´efense. En 2007, mon employeur m’a propos´e d’entreprendre des ´etudes doctorales pour la poursuite de mon d´eveloppement professionnel. Les ´etudes support´ees par l’employeur devaient ˆetre r´ealis´ees sur le site mˆeme de RDDC-Valcartier et dans le cadre du pro-gramme de recherche de ma section d’appartenance `a l’´epoque, soit la section des syst`emes d’aide `a la d´ecision et de C2. L’offre me paraissait opportune pour deux raisons: 1- la possibilit´e de me replonger concr`etement dans du travail scientifique et technique tout en ayant la charge enti`ere des diverses tˆaches et travaux, 2- l’occasion pour approfondir mes connaissances en intelligence artificielle et explorer de nouveaux domaines comme la neuroscience informatique.

J’ai donc entrepris le doctorat sous la direction de M. Grenier. Malgr´e la s´eparation de nos lieux de travail respectifs, il a toujours bien suivi mon cheminement. Je retiendrai toujours son ´ethique de travail, sa rapidit´e d’esprit, sa g´en´erosit´e et sa disponibilit´e envers les ´etudiants, ainsi que sa grande modestie. C’est peut-ˆetre cette qualit´e qui fait de lui un professeur tant appr´eci´e par ses ´etudiants. Les professeurs jouent un grand rˆole dans le d´eveloppement des ´etudiants, non seulement par les connaissances et les habilet´es qu’ils leur transmettent, mais aussi par l’exemple de professionnalisme qu’ils repr´esentent en tant que professeurs et ing´enieurs. C’est un des aspects pour lequel j’ai ´et´e choy´e d’´etudier au d´epartement de g´enie ´electrique et de g´enie informatique de l’Universit´e Laval.

Je remercie RDDC-Valcartier pour le soutien de mes ´etudes doctorales dans le cadre de son programme de recherche. Je dois noter le support de M. Christian Carrier,

(5)

sci-Avant-propos v

entifique en chef `a RDDC-Valcartier, qui a suivi ma progression. Je tiens `a souligner l’intervention de M. St´ephane Paradis qui, ´etant devenu mon chef de section peu avant la fin de mon doctorat, s’est aff´er´e `a ´etablir des conditions de travail plus saines dans sa nouvelle section. Indirectement, son approche juste et consciencieuse a facilit´e grande-ment l’ach`evegrande-ment de mes ´etudes doctorales.

J’en profite pour saluer quelques coll`egues et amis de ma section `a RDDC-Valcartier: Abder Rezak Benaskeur, Jean Berger, ´Eric Dorion et Hengameh Irandoust. Leur sens des valeurs et leur int´egrit´e auront marqu´e mes ann´ees pass´ees au doctorat. L’impact qu’a eu ´Eric sur ma carri`ere est inestimable. Dans des situations tumultueuses, la confiance envers quelqu’un est le meilleur appui qu’on puisse esp´erer. Avec ´Eric, la confiance ´etait `a cent pour cent. Quelle chance j’ai eu de le connaˆıtre. En d´epit des embˆuches qu’il a rencontr´ees, il a toujours fait preuve de droiture et d’une remarquable r´esilience. Il a jou´e un grand rˆole dans mon cheminement et dans la bataille commune que nous livrions. Je lui souhaite beaucoup de succ`es dans son nouvel emploi et, surtout, qu’il y retrouve les valeurs qui lui sont ch`eres. Par ailleurs, je remercie Abder pour ses conseils judicieux, pour le mod`ele de chercheur qu’il repr´esente ainsi que pour les nouvelles opportunit´es qu’il m’a offertes. Je remercie ´egalement Jacques B´edard, qui m’a fait connaˆıtre une toute nouvelle perspective sur le monde de la recherche. J’esp`ere avoir pu retenir de sa vaste exp´erience et de ses innombrables aptitudes, notamment dans l’art de la r´edaction scientifique. Je ne voudrais pas oublier Claude Roy et Robert Charpentier, de qui je garderai toujours un bon souvenir.

Finalement, je remercie mes parents Michel et Denise. Ils m’ont toujours guid´e par leurs bons conseils dans la poursuite de mes ´etudes et de ma carri`ere.

(6)

Science sans conscience n’est que ruine de l’ˆame ... - Fran¸cois Rabelais

(7)

Contents

R´esum´e ii Abstract iii Avant-propos iv Contents vii List of Tables x List of Figures xi

List of Symbols xiii

List of Acronyms xvii

1 Introduction 1

1.1 Context of the Research . . . 1

1.1.1 Liquid state machines: From machine learning to neuroscience . 2 1.1.2 Contribution to Defence Research and Development Canada - Valcartier’s program . . . 5

1.2 Motivation . . . 5

1.3 Goals and Assumptions . . . 7

1.4 Organization of the Dissertation . . . 8

2 Supervised temporal pattern recognition 9 2.1 Static versus temporal classification . . . 10

2.2 Classification of streams . . . 12

2.3 Sequential classification . . . 12

2.3.1 Non-restrictive . . . 14

2.3.2 Real-time . . . 15

2.3.3 Delayed . . . 15

2.4 Sequential approach for the classification of streams . . . 15

(8)

2.6 Conclusion . . . 17

3 Spiking neural networks 18 3.1 Basic concepts of computational neuroscience . . . 18

3.2 Neural network models . . . 21

3.3 Spiking neuron model . . . 23

3.3.1 Leaky-integrate-and-fire (LIF) model . . . 24

3.4 Connectivity . . . 26

3.5 Synapses . . . 26

3.6 Synaptic plasticity . . . 27

3.6.1 Dynamic synapses . . . 27

3.6.2 Spike-timing dependent plasticity . . . 28

3.7 Biological realism . . . 30

3.8 Conclusion . . . 31

4 Liquid state machine 32 4.1 Concept . . . 34

4.2 The liquid . . . 35

4.3 The readout . . . 36

4.4 Input conversion . . . 37

4.4.1 Injection current . . . 38

4.4.2 Gaussian receptive fields . . . 39

4.5 Separation and approximation properties . . . 40

4.6 Review of existing methods . . . 42

4.6.1 Liquid state sampling and classification . . . 42

4.6.2 Unsupervised training of the synapses . . . 43

4.7 Conclusion . . . 46

5 New liquid sampling methods 47 5.1 Liquid state vector based on the frequency components of short-time membrane potential signals . . . 47

5.1.1 Lower limit on the liquid sampling interval . . . 51

5.2 Combination of multiple liquid state features . . . 52

5.2.1 Liquid state features . . . 52

5.2.2 Multistate combination . . . 55

5.3 Experimental details . . . 55

5.3.1 Datasets . . . 55

5.3.2 Setup and parameters . . . 58

5.3.3 Determination of the parameters for computing the frequency components . . . 60

(9)

Contents ix

5.4 Results . . . 63

5.5 Conclusion . . . 68

6 Class conditional STDP training of the liquid 69 6.1 Description of the method . . . 70

6.2 Experimental details . . . 73

6.2.1 Datasets . . . 73

6.2.2 Setup and parameters . . . 74

6.3 Results . . . 77

6.4 Conclusion . . . 81

7 Liquid state measures and their relation to the approximation prop-erty 82 7.1 Liquid state measures . . . 82

7.2 Relation to the approximation property . . . 83

7.3 Experiments . . . 85

7.4 Conclusion . . . 89

8 Conclusion 92 8.1 Contributions . . . 92

8.2 Other considerations and suggestions for future works . . . 95

A Pulse coding versus rate coding 98 B Average number of connections in the liquid 101 C Percentile-based bar graphs 103 D Software 105 D.1 LSMmaker . . . 105

D.2 Pattern recognition over liquide states . . . 107

(10)

5.1 Regrouping of the original frequency components as a results of the grouping factor (GF = 4). Column 1 shows the original components as a result of the 16-points DFT and column 2 shows labels that illus-trate which original components are grouped together. . . 59

5.2 Average recognition rates (%) for the Synthetic Control Charts (SCC) time series. . . 63

5.3 Average recognition rates (%) for the 3 classes (λ = 2, 8 and 16) binomial-Poisson data. . . 63

5.4 Average recognition rates (%) for the seismic data of military vehicles. 64

5.5 Number of dimensions of the liquid state vectors. . . 64

5.6 Comparison of average recognition rates (%) obtained on different datasets and with a sampling interval of 6.25ms instead of 100ms, for the liquid state vectors made of firing rates and membrane potentials, respectively. The liquid size is 7×7×3. . . . 67

6.1 Separation measures for different databases . . . 77

6.2 Average recognition rates (%) and standard deviations obtained with the class independent (CI) and class-conditional (CC) STDP training methods for different datasets. The liquid state vector is made of the firing rates of the neurons. . . 78

6.3 Average recognition rates (%) and standard deviations obtained with the class independent (CI) and class-conditional (CC) STDP training methods for different datasets. The liquid state vector is made of the frequency components of the membrane potentials (Ω). For comparison, the first column also shows the results of CI-STDP training when firing rates (F ) are used (see Table 6.2) . . . 80

(11)

List of Figures

1.1 Neuroscience and machine learning . . . 4

2.1 Illustration of a stream . . . 11

2.2 Different spatiotemporal classification problems. . . 13

3.1 Anatomy of a neuron . . . 19

3.2 Action potential . . . 21

3.3 Spiking neuron . . . 24

3.4 Leaky-integrate-and-fire neuron . . . 24

3.5 Spike-timing dependent plasticity (STDP) modification . . . 29

4.1 Ripples at the surface of a liquid . . . 32

4.2 Liquid State Machine layout . . . 34

4.3 Gaussian receptive fields . . . 39

4.4 Pathological synchrony and over-stratification . . . 44

4.5 Example of a desirable spiking behavior of the liquid . . . 44

4.6 Unsupervised STDP training of a SNN . . . 45

5.1 Frequency-based readout technique . . . 49

5.2 Samples of synthetic control chart (SCC) time series . . . 57

5.3 Repartition of data for readout training and testing . . . 61

5.4 Spectrogram of the membrane potential of a neuron in the liquid. . . . 62

6.1 STDP training methods . . . 72

6.2 Repartition of data for STDP training, readout testing and readout training 76 6.3 Class specific recognition rates obtained with the class independent (CI) and class-conditional (CC) STDP training methods, for the SCC (a) and LEM (b) databases. Firing rates are used as the liquid state vector. . . 79

6.4 Example of a two-class separation problem. (a) The two classes are not linearly separable by a perceptron, although the samples are separated by relatively large distances. (b) Even though the samples from the two classes are separated by smaller distances, they are indeed linearly separable. . . 81

(12)

put and a 1-dimensional measure . . . 84

7.2 Distribution of the variance of firing rates . . . 86

7.3 Variance of firing rates . . . 87

7.4 Repeated experiments for measuring the recognition rate in terms of the variance of firing rates . . . 88

7.5 Mean of firing rates . . . 89

7.6 Entropy of firing rates . . . 90

7.7 Correlation between mean, variance and entropy of liquid states . . . . 91

A.1 Binary representation of M input values using rate coding . . . 99

A.2 Binary representation of M input values using pulse coding . . . 100

C.1 Percentile-based histogram . . . 104

D.1 Programs created for testing and evaluating different Liquid State Ma-chine methods presented in the thesis. The created programs (blue boxes) interact with the Circuit-Tool and CSIM toolboxes [1, 2] (black boxes), that are available online at www.lsm.tugraz.at[3]. . . 106

D.2 Graphical user interface for the creation and configuration of liquid state machines. . . 108

D.3 Classification interface that includes a linear perceptron. . . 109

D.4 Graphical user interface to perform time fusion of classifiers’ outputs. . 110

(13)

List of Symbols

Symbol Definition

Γi set of all incoming synaptic connections to a neuron i

∆abs absolute refractory period

∆w synaptic weight update in the spike-timing-dependent plasticity model

Θ firing threshold (spiking neuron model) or activation threshold (non-spiking neuron model)

Λ set of class indexes

Υ set of potential firing times

Ψ 1-dimension measure of the liquid

Ω Identifier for the frequency components of membrane potentials α spatiotemporal pattern

β parameter that controls the width of the gaussian receptive fields γ parameter of the connectivity model

δ(t) pulse emitted at time t by a neuron  small time increment

¯ internal variable of the spike-timing-dependent plasticity model ζ membrane potential function

η time interval multiplication factor

κ 1 RC

λ Poisson distribution: the expected number of occurrences during the given interval

µ response function of a synapse

ξ+,ξ− scaling factors for long-term potentiation (+) and depression (−)

in the spike-timing-dependent plasticity model ρ(c) amount of variance within a class c of state vectors

σ standard deviation of the gaussian receptive fields τm membrane time constant of a neuron

τf acil facilitation time constant of the dynamic synapse model

(14)

model

τ+,τ− time constants for long-term potentiation (+) and depression (−)

in the spike-timing-dependent plasticity model

φ function that embodies the spiking activity of a spiking neural net-work

χi set of liquid state vectors associated to input streams in class i ω frequency index in the discrete fourier transform

i magnitude of the ωth discrete Fourier coefficient for a neuron i (Xi)

Ai vector of the discrete Fourier coefficients for a neuron i

B numerical input value

C capacitor in the model of a Leaky-integrate-and-fire neuron D(i, j) Euclidean distance between a neuron i and a neuron j

E set of synapses F firing frequency

Fv set of firing times for neuron v

G Pool of neurons

H entropy

I electric current L stream length

M number of set of classes MX matrix of liquid state vectors

N number of points of discrete window in the discrete fourier trans-form

Na number of inputs of a neuron (non-spiking model)

Ni number of state vectors associated to input streams in class i

Ncn number of input conversion neurons for the gaussian receptive fields

conversion method

NY number of spiking neurons in a spiking neural network (or in a

liquid)

Ns number of equally spaced spike times in the gaussian receptive fields

Nu number of dimension of an input stream u

Nx number of elements in vector x

P probability

Pv potential equation for neuron v

S scaling parameter of the connectivity model

R resistance in the model of a Leaky-integrate-and-fire neuron V membrane potential

(15)

List of Symbols xv

W weighting factor

T time window for calculating the firing rate Tb time interval associated to a numerical value B

TD total duration of an input stream

Te sampling interval of the liquid

Tu time interval between each time samples in a data stream u

Tcˆ time interval between each output of a classifier

Tcˆmin minimum output interval of a classifier when frequency components are used as inputs

T(f ) interval between two spikes for a Leaky-integrate-and-fire neuron

Tth time a neuron needs to reach the firing threshold Θ

U set of input streams

Xiω ωth discrete Fourier coefficient for a neuron i Y set of spiking neurons

GF grouping factor of the frequency components LQ symbol for designating a liquid

MX Matrix made where each row corresponds to a state vector SC output score of a classifier

Sep measure of the separation property P Ci

¯

Ai vector of averaged frequency components for a neuron i

¯

R internal variable of the dynamic synapse model ¯

U constant in the dynamic synapse model R set of real numbers

R+ set of real positive numbers

Z3 three-dimensional space in which neurons of a spiking neural net-work are positioned

Nx× Ny× Nz Dimensions of a cuboid of neurons

a input of a neuron (non-spiking model)

b input of a neuron’s activation function (non-spiking model) c class label

c set of classes ˆ

c estimated class

e sampling function applied to the liquid f activation function of a neuron

fe sampling frequency

g function that calculates the firing rate of a neuron h classifier/classification function

(16)

j index or label number k discrete time index

l index number

n binomial distribution: number of independent experiments nneg number of neglected outputs of a classifier

o output of a neuron (non-spiking model)

p binomial distribution: probability of success for an experiment r degree of confidence in the output of a classifier

s time index t time index t(f ) firing time u data stream v spiking neuron v− presynaptic neuron v+ postsynaptic neuron

w synaptic weight or weighting factor of the input of a neuron x liquid state vector and feature vector of a classifier

xi state of a neuron i

xΩ frequency-based liquid state vector

y spiking neuron cn conversion neuron

rvi response value of the ith gaussian receptive fields

ctri center of the gaussian for the ith conversion neuron

¯

xi center of mass for χi

¯

u internal variable in the dynamic synapse model (¯x,y,¯ z)¯ position of a neuron in the space

(17)

List of Acronyms

Acronym Definition

BCI brain-computer interface BP binomial-Poisson

C2I command, control and intelligence CI class independent

CC class conditional CSIM circuit simulator

DFT discrete Fourier transform

DRDC Defence Research and Development Canada ESN echo states network

EPSP excitatory postsynaptic potential FFT fast Fourier transform

FPGA field-programmable gate arrays HMM hidden Markov model

IPSP inhibitory postsynaptic potential LEM lightning electromagnetic

LIF leaky-integrate-and-fire LSM liquid state machines LTD long-term depression LTP long-term potentiation

SASNet self-healing autonomous sensor network SCC synthetic control chart

SensIT sensor information technology SMV seismic military vehicle

SNN spiking neural network

STDP spike-timing dependent plasticity TDNN time delay neural network

(18)

Introduction

1.1

Context of the Research

With the advent of electronics and microprocessors, modern society has seen the cre-ation of systems capable of replicating many intelligent tasks that humans do. As com-puter power increased over the years, the complexity and capabilities of these systems multiplied. From simple arithmetic calculators to thermostats, today’s computers can perform speech recognition, gesture recognition, computer assisted medical intervention and driverless cars, to name just a few.

If much has been achieved in the last years, artificial intelligent systems can only replicate a small fraction of human intelligence. The mechanisms of learning, in par-ticular, are still not completely understood. Moreover, the human brain is much more computationally powerful than any computer machine that exists to this day. After a recent study on brain connections [4], researchers at the Stanford University School of Medicine declared that “A single human brain has more switches than all the computers and routers and Internet connections on Earth”[5].

Hence, there is much to discover and a long way to go in the development of artificial intelligent systems. In this effort, many believe that the path to more progress is in the study of how biological neural systems, or neural microcircuits [6], work. More precisely, the interest lies in the understanding and modeling of computation and information processing in the brain. This is the field of computational neuroscience. As shown in Figure 1.1, computational neuroscience provides models of learning which can be exploited in artificial intelligence applications.

(19)

Chapter 1. Introduction 2

Among the wide range of research areas behind artificial intelligence applications, one of the most prominent is machine learning. Machine learning is concerned with the problem where a computer has to learn from experience. When the problem involves identifying patterns from raw data, it is called a pattern recognition problem. As defined in [7], pattern recognition is “the act of taking in raw data and taking an action based on the category of the pattern”. As humans, pattern recognition is prevalent in our daily lives. Often, the recognition of patterns has a temporal nature. The time at which data are observed must be taken into account. Such temporal reasoning is at the basis of our human intelligence. Temporal pattern recognition is also essential to our daily tasks. Understanding sounds or words that we ear, identifying someone’s gestures as well as reading a text are examples of human activities that require the capacity for integrating information and associating a meaning or signification to it. The capacity for temporal pattern recognition enhances as new observations or experiences are acquired along with their particular meaning. This is the process of learning. The temporal aspect is very significant in various learning tasks. Except for certain tasks, like the visual recognition of fixed objects, it is only in rare cases that phenomena are recognized based on characteristics that are fixed in time. Hearing, for example, is almost entirely based on the variation of sounds through time. Its only exception is for sound intensity, which is a characteristic that can be measured anywhere in time without requiring any past measurements. Hence, the human brain is a very skilled system for the recognition of temporal patterns.

1.1.1

Liquid state machines: From machine learning to

neu-roscience

Throughout the years, the field of machine learning has seen the development of many techniques for the problem of pattern recognition. Classification is a particular problem of pattern recognition. It consists in assigning classes to sets of data. As learning considers training data labelled with their correct class, classification is a supervised learning problem. Static classification considers data that are fixed in time. Many static classification algorithms exist. Bayes classifiers, k-nearest neighbors, decision trees and artificial neural networks such as multi-layer perceptrons are among the most popular static classification methods. They have been applied to various problems such as handwritten character recognition [8,9, 10], for example. Temporal classification, that deals with data sequences, is more complex. Generally, the methods used for static classification are no longer appropriate when the data under study change in time. Methods capable of assimilating the temporal structure of input data are therefore needed. The ability to learn the relations between past and present data is necessitated.

(20)

on the processing of data at the current time.

On a different note, in the field of neuroscience, the understanding of the biological mechanisms behind learning and memory is still at an early stage. Many questions related to the various learning processes undergone in the brain remain unanswered or are under debate. In fact, a lot of fundamental questions about how the brain works remain wide open [11, 12]. Meanwhile, research in neuroscience has accelerated in the recent years. Supported by the development of a variety of technological instruments, new findings have came out that help explain better how real neural systems work. New models and concepts have emerged while some have been refined. If biology and medicine benefit from these new models, the latter also represent an inspiration for engineering applications. The appeal of biologically inspired models for engineering applications is obvious: so far no artificial intelligence technique can compete against real neural systems. Consider the case of the fly, an insect. With only a few thousand neurons, it is able to perform full flight control by processing data received from its thousands eye facets. In the same exact conditions, it is impossible to replicate the same capabilities with the current artificial neural network models.

One of the main properties of recent models of neural systems is the presence of electrical pulses that neurons transmit to each other. According to these models, a neuron communicates by emitting pulses through time. The reaction of the receptive neuron depends on the synchronism of the pulses emitted on the different connections. Time has therefore a primary role in the neuron’s response. It has been demonstrated that pulsed or spiking neuron models, because of the consideration of time, allow a higher computation power than the conventional non-spiking artificial neuron models [13, 14]. The conventional non-spiking artificial neuron models refer back to the work of McCulloch and Pitts [15]. McCulloch and Pitts defined a model made of a weighted sum of inputs onto which an activation function is applied. The activation function usually has a sigmoidal form.

Another fundamental property of biologically inspired neural network models is the presence of recurrence in the connections [16, 17]. Although non-recurrent artificial neural networks, also called feed-forward networks, have been used extensively and successfully in static classification, they are poorly adapted to tasks with a temporal aspect. They lack a memory. Recurrent connections, added with delays in communica-tions, enable some sort of a memory effect. The consequence is a better representation of the temporal structure of input data. Artificial neural networks, that include Time Delay Neural Networks (TDNN) [18] and Elman networks [19], for instance, have been developed to tackle the temporal problem. However, there are somewhat limited to

(21)

Chapter 1. Introduction 4

Theorethical

(study of neural systems)

Engineering

(Artificial reproduction of

neural system functions)

Neuroscience

Neural system models

• cellular

• molecular

• Computational

• …

Applications

• Machine learning

• Perception

• Reasoning

• Motion

• Planning

• …

Pattern

recognition

Figure 1.1: Relation between neuroscience and machine learning. specific temporal problems and they have no or only little biological basis.

Furthermore, learning as a biological mechanism still remains largely unknown. From an engineering perspective, conventional non-spiking neural network models have been provided with learning techniques such as error backpropagation [20,21]. Because of their relative simplicity, these models are easily trainable. However, when it comes to the more complex biologically inspired networks, where spikes and recurrence are considered, no method with a complete and efficient supervised learning approach has been published until now.

A simple way to avoid the problem of training spiking recurrent neural networks is simply not to train them. Given this proposition, the neural network is instead created according to biologically relevant distribution laws and according to general assumptions and initial conditions. For different temporal inputs, the neural network should acquire different internal states through time. The approach then consists in

(22)

network. A separate learning algorithm can then be used, where the relation between the internal state of the network and the membership of the temporal input data within a particular class is learned.

In the case of recurrent spiking neural networks, the latter mentioned approach has been developed under the name “Liquid state machine” [22]. The term liquid is used for representing the recurrent spiking neural network which, like a liquid, reacts dynamically to perturbations or inputs. Liquid state machine is a biologically inspired approach that is well suited for supervised temporal pattern recognition. This technique will be studied thoroughly in this thesis.

1.1.2

Contribution to Defence Research and Development

Canada - Valcartier’s program

This Ph.D. research project was funded by Defence Research and Development Canada - Valcartier (DRDC-Valcartier). Through the Command, Control and Intelligence (C2I) section, DRDC-Valcartier is engaged in the research and development of situation anal-ysis tools to support the Canadian Forces in their operations. Furthermore, today’s military operations deal with vast and increasing amount of information received from sensory data and other sources. The large amount of information may possibly overload the analysts’ attention. For that reasons, algorithms that support situation analysis in many activities are being investigated. The goal is to provide automated data pro-cessing and reasoning for specific tasks of the operations. For example, a technology under investigation at DRDC-Valcartier is the one of sensor networks for surveillance applications. In the project “self-healing autonomous sensor network” (SASNet), unat-tended ground sensors are used to detect and identify the transit of different sources. In a less extent, this project share similarities with Defence Advanced Research Projects Agency’s Sensor Information Technology (SensIT) program [23], where one of the pri-mary goals is the classification of targets based on the sensed signals. Liquid state machines, as demonstrated in the thesis, are well-suited for this kind of tasks.

1.2

Motivation

There are a number of reasons that motivate the interest in Liquid state machines and, in a more general perspective, in computational neuroscience. First, computational

(23)

Chapter 1. Introduction 6

neuroscience has already produced complex neural models. These models are applica-ble to different engineering proapplica-blems such as pattern recognition, for example. Second, the field of neuroscience gains popularity around the world. More and more universities and private industries jump into the enterprise. As a result, progress in neuroscience is accelerating. For instance, scientists are currently working on the creation of hard-ware that would imitate biological neural systems. Such hardhard-ware is referred to as neuromorphic systems [24]. Neuromorphic systems are believed to be much more com-putationally powerful than the conventional machines that are currently used around the world today [25]. Their advantage lies mostly in their ability for parallel computing. Although engineering applications have not yet exploited this growing technology, we can expect neural based computing devices to become available in the coming years. This could represent the start of a new era in artificial intelligence, with computational neuroscience having a prominent role.

From DRDC-Valcartier’s outlook, keeping the edge with the latest findings in ar-tificial intelligence systems should be a minimum. At this time, biologically inspired systems, like Liquid state machine, look to be very promising in terms of real-time computing and learning capabilities. The recent surge of interest for neuroscience in artificial intelligence applications makes it a good reason to believe that significant progress could be achieved in the coming years. At this respect, through funding of different projects, the United States have already recognized that breakthroughs in neu-roscience could have a significant impact on defence technology development [26,27,28]. At DRDC-Valcartier, the presented work in this thesis represents a first investigation of biologically inspired neural systems as a tool for pattern recognition tasks. At this time, the aim is not to replicate real neural systems physically, but to implement, simulate and exploit biologically inspired models on conventional computing devices.1 To begin

with, it is believed that these systems are particularly relevant in sensing/surveillance applications. For instance, they could be applied to automatic target recognition using ground-based or ship-based radar signatures. More generally, object, behavior, situ-ation and threat recognition from different sensing modalities that provide complex signatures are all good candidate applications.

Finally, the study of liquid state machines requires notions of both neuroscience and electronics. This makes it an interesting topic for electrical engineering. Personally, this represents a good opportunity for learning concepts of neuroscience while at the same time applying notions of electronics. As more, the problem of designing liquid state machines for temporal pattern recognition represents an additional challenge.

1The term “conventional computing device” refers to the current generation of computing devices based on the Von Neumann architecture. This architecture is characterized by a bus connecting the processor and main memory. It is found in almost all computers around the world.

(24)

As a first goal, the presented work aims at familiarizing with the latest biologically inspired neural network models that are applicable to engineering problems. Liquid state machines are defined on these models. Spiking neural network models, for which information transmission and representation is explained by the synchrony of the spikes that are emitted through time, recently established as the most realistic representation for modeling real living neurons [29]. There is a growing interest around them [30, 31, 32]. Therefore, it looks highly probable that they might be used in artificial intelligent systems of the future.

On a different note, although there exists a few works for guiding the conception of a Liquid state machine in temporal pattern recognition applications [33,34, 35, 36, 37, 38, 39], there is no exhaustive procedure for specifying how to create it [40, 41]. The creation and configuration of a liquid state machine allows many possibilities, both on the structure and on the parameters. Generally, the conception should be aimed at exploiting the liquid in such a way that the ability for the recognition of input patterns is maximized. For this achievement, a study of the liquid state machines that are described in the literature is necessary. More precisely, the exercise should concentrate on the sampling process of the liquid state, i.e. on the creation of a state vector. The related goal is to obtain distinct liquid state samples for different classes of inputs, hence improving the ability for recognizing different classes of patterns. Concerning this issue, a question arises: given a liquid formed of a spiking recurrent neural network and based on recent biologically inspired models, how should the liquid be read such that the ability for the recognition of patterns recognition of patterns is maximized? Also, even though Liquid state machines use untrained spiking recurrent neural networks, there may exist ways to train them for improving pattern recognition. At this end, a goal pursued in this thesis is to find a method for training the liquid.

In contrast to neuroscientists, who focus mainly on learning how real neural systems works, engineers are concerned with the application of neuroscience concepts for solving real life problems. In this work, the main objectives related to the study of liquid state machines are :

1. to demonstrate that liquid state machines are well adapted for temporal pattern recognition,

2. to show how liquid state machines can compete with other techniques in different temporal pattern recognition problems,

(25)

Chapter 1. Introduction 8

3. to explore the relations between the performance of a liquid state machine and different measures of the liquid state, and

4. to test the suggested liquid state sampling and liquid training methods on different temporal data sets. The obtained results shall be compared with the ones obtained with the conventional liquid state machine approaches. Increased recognition rates are expected.

1.4

Organization of the Dissertation

Including this introduction, the thesis is divided in eight chapters. The problem of supervised temporal pattern recognition, in which liquid state machines are applied, is presented in Chapter 2. As mentioned previously, liquid state machine is a concept based on recent spiking neural network models. Spiking neural networks are discussed in Chapter 3, where the roles and functions of neurons and synapses are described. Chapter 4 defines the concept of liquid state machines. The different components are described along with the main properties that a liquid state machines should hold. The chapter includes a review of the state-of-the-art in liquid state machines, focusing on the liquid state sampling methods and the liquid training methods. In Chapter5, new liquid sampling methods are proposed. The novelty includes the use of frequency components of the liquid dynamical state and the combination of different types of liquid state features. The suggested methods are tested on both synthetic and real data. They are compared to existing liquid sampling methods. A new method for training the liquid is defined in Chapter 6. The method, called class conditional spike-timing dependent plasticity training, is compared to the most recent spike-timing dependent plasticity training method found in the literature. Both the new and the existing methods are tested on different data sets. In Chapter7, three different measures of the liquid states are studied: the mean, the variance and the entropy. The relation between the measures and the recognition rate of the liquid state machine is explored. The thesis ends with concluding remarks in Chapter 8. A review of the innovations presented in the thesis is made along with suggestions for future works.

(26)

Supervised temporal pattern

recognition

Pattern recognition has to do with the assignment of an output value to a given input value [7]. Classification is a specific case of pattern recognition, where the problem is to associate an input value to one of a given set of classes. The classification can be supervised or unsupervised. In the former, training data are available to help the classification algorithm at the first step of learning. Supervised classification is the problem of interest in this thesis.

A classification problem can be either static or temporal. Both cases are compared in Section 2.1 of this chapter. In this thesis, the study concentrates on temporal clas-sification. In particular, the classification of finite streams is studied. The goal is to assign streams to classes of pattern. A formal definition of the problem of classification of streams is given in Section 2.2. In Section 2.3, the problem of sequential classifica-tion is presented. In contrast to the classificaclassifica-tion of streams, it consists in assigning a sequence of classes to a stream. Section 2.4 discusses the similarities between classifi-cation of streams and sequential classificlassifi-cation. It is shown how classificlassifi-cation of time sequences can be achieved by a sequential classification approach. Finally, classification performance measures are presented in Section2.5.

(27)

Chapter 2. Supervised temporal pattern recognition 10

2.1

Static versus temporal classification

In static pattern recognition problems, input data is assumed to be fixed in time such that time of data is not relevant to the recognition task. In such case, the input of a classifier is made of a feature vector x = [x1, ..., xN]T, in which each of the N elements

represents a feature of the observed phenomena. Vector x has a related class c among the set of M classes, c = {c1, ..., cM}. c is also known as the decision domain. The

task of a classifier is to determine the class of its input vector x given a set of training vectors. A training vector is a vector whose class is known. In other words, a class has already been assigned to it. A static classifier can be represented by a function h such that

ˆ

c= h(x), (2.1)

where ˆcis the estimated class of vector x.

A typical example of a static pattern recognition problem is the one of handwritten character recognition, where the feature vector coincides with a specific image of a handwritten character and the class corresponds to a letter of the alphabet [10].

A classification problem has a spatiotemporal nature when the features of the ob-served phenomena change in time and when the class related to the obob-served phenomena depends on the time evolution of the features rather than a fixed representation. The corresponding feature vector is then expressed in function of time:

u(t) = [u1(t), ..., uN(t)]T, (2.2)

where t is the time index.

In some works, the series in time of the different features are called “data flows” [42], while in other works they are called “signals” [43, 34], “input sequences” [43], “time-series” [44] or “streams” [43]. In this thesis, “streams” will be mostly used. Otherwise, “signals” will sometimes be employed in conformity with the underlying domain under discussion (e.g. Fourier transforms).

A data stream delimited by an initial time tinit and a final time tend is expressed by

u(tinit, tend) = [u1(tinit, tend), ..., uN(tinit, tend)]T, (2.3)

where ui(tinit, tend) correspond to the series of observations of feature i between times

tinit and tend. Suppose that observations are obtained at regular time interval Tu, then

(28)

sequence of feature i between discrete times kinit and kend is given by ui(kinit, kend) =

[ui(kinit), ui(kinit+ 1), ..., ui(kinit+ ku)], where kinit+ ku = kend. A stream example is

illustrated in Figure 2.1. u1(0) Tu k u2(0) k . . . uN(0) k

u(kinit) u(kend)

u(kinit, kend) uN(kinit) uN(kinit+1) uN(kend) u1(kend) u1(kinit) . . . Features

Figure 2.1: Illustration of a stream. Each point represents a value of the stream. The most extensive definition of spatiotemporal classification uses the notion of spa-tiotemporal pattern [34]. A spaspa-tiotemporal pattern α is represented by a set of elements {αinit, αend, u(αinit, αend)}, where αinit and αend correspond to the start time and end

time of the spatiotemporal pattern, respectively, and where u(αinit, αend) correspond to

the stream between times αinit and αend. Each spatiotemporal pattern α as a related

class c∈ {c1, ..., cM}. Given a stream u, the task of a classifier is to produce some sets

{tinit, tend, c}, where tinit, tend and c are the start time, the end time and the class of the

underlying spatiotemporal pattern, respectively. This formalism uses a continuous time scale. Another type of formalism could use a discrete time scale, where each discrete point in time is related to a class of spatiotemporal pattern. This discrete formalism is the one used in this work. According to the discrete formalism, two spatiotempo-ral pattern recognition problems should be distinguished: classification of streams and

(29)

Chapter 2. Supervised temporal pattern recognition 12

sequential classification.

2.2

Classification of streams

Consider a data stream that last L + 1 discrete time steps, represented by u(0, L). The sequence classification problem is simple: associate a class to the whole stream. The task of a classifier is then to return an estimation ˆc of the most representative class of u(0, L), that is

ˆ

c= h(u(0, L)), (2.4)

where classifier h considers the L + 1 discrete time steps of input u in order to deter-mine a class. Such problem is shown in Figure 2.2(a). It is the same as the sequence classification problem defined in [44, 45] and the weak temporal classification problem defined in [42].

2.3

Sequential classification

The sequential classification problem consists in producing a sequence of classes associ-ated to a stream. A particular case of this problem has a class associassoci-ated to each of the discrete times of the stream, as shown in Figure 2.2(b). Thus, given a stream u(0, L) that last L + 1 discrete times, a sequence of L + 1 class estimation is returned:

ˆ

c= [ˆc(0), ˆc(1), ..., ˆc(L− 1), ˆc(L)] = h(u(0, L)). (2.5) In this case, the time interval Tcˆ between each step of class estimation is identical

to the time interval Tu between two consecutive time steps in the stream u. Such a

representation is similar to the one of sequential supervised learning defined in [44]. More generally, instead of having a class estimation at an interval Tˆc = Tu, as shown

previously, classes could be estimated at larger intervals Tˆc > Tu. Suppose that the

larger intervals are limited to multiples of Tu, such that Tcˆ= ηTu, η∈ N. Figure2.2 (c)

shows an example with η = 2. In this example, a sequence of bL/2c class estimations is returned for the L + 1 discrete time steps. Generally, a sequence of Lη =bL/ηc class

estimations is returned for an input stream u(0, L): ˆ

c= [ˆc(0), ˆc(1), ..., ˆc(bL/ηc)] = [ˆc(0), ˆc(1), ..., ˆc(Lη)] = h(u(0, L)). (2.6)

This problem description is similar to the strong temporal classification problem de-fined in [42]. It should be specified that, in contrast to the proposed description in

(30)

T

u

u

t

(a) A class is associated to a pre-determined segment a the stream.

T

u

=T

c u t

)

1

(

ˆc

ˆc

(

2

)

ˆc

(

3

)

c

ˆ

(

L

1

)

c

ˆ L

(

)

)

0

(

ˆc

^

(b) A sequence of classes is associated to the stream. Each discrete time step of the stream has an associated class.

u t

)

1

(

ˆ

L

η

c

c

ˆ

(

L

η

)

)

2

(

ˆc

)

1

(

ˆc

u

T

)

0

(

ˆc

T

c

(c) A class is estimated for each time interval Tcˆ.

Figure 2.2: Different spatiotemporal classification problems. A stream of a single spatial dimension (one feature) is shown.

(31)

Chapter 2. Supervised temporal pattern recognition 14

[42], constant class estimation intervals are used; each successive class estimation is separated by the same amount of time. This yields a simple discrete representation of the sequential classification problem.

Concerning the class labels associated to the stream, the simplest case has a class associated to each time step of the stream. The class vector has the same dimension than the one presented in equation (2.5). Otherwise, consider the case where class labels are separated by a constant time interval Tc. This time interval of the true class labels

is not necessarily the same than the time interval of the estimated class labels (Tˆc).

Finally, there is also the more complex case where the true class labels are separated by variable time intervals.

In short, a stream u has an associated vector c containing the true class label sequence in time and, as a result of spatiotemporal classification operation, a vector ˆc containing the estimated class label sequence. Sizes of u, c and ˆc are not necessarily equal.

Furthermore, the decisions returned by a sequential classification algorithm are con-ditioned by the values in the stream at different times. In this work, three particular decision cases are suggested: non-restrictive, real-time and delayed.

2.3.1

Non-restrictive

In order to take a decision, a classifier rely on some input data sampled over a finite time duration. In non-restrictive sequential classification, there is no limitation on the time of the information that a classifier can consider to take a decision. A decision about the underlying class at time k can be based on any time steps of the input stream, such that

ˆ

c(k) = h(u(k1, k2)), k2 > k1, k ∈ {0, ..., Lη}, and k1, k2 ∈ {0, ..., L}. (2.7)

When all time steps of the stream are considered, then the classification for time k is given by

ˆ

c(k) = h(u(0, L)), 0≤ k ≤ Lη. (2.8)

Additionally, a classifier may be restricted or not about the times at which its decisions must be returned. In the non-restrictive case, there is no restriction on the decision time. Hence, the decision related to time k can be returned at any time kd,

(32)

condition, the whole sequence decision ˆc = [ˆc(0), ˆc(1), ..., ˆc(L− 1), ˆc(Lη)] associated to

a stream u(0, L) could be returned at a single time.

2.3.2

Real-time

The real-time case supposes that the classifier does not have access to stream values that are produced after the decision time. Hence, for each time step k ∈ [0, 1, ..., Lη],

classifier h returns a decision ˆc(k) defined as ˆ

c(k) = h(u(k1, k)), k≥ k1 and k1,∈ {0, ..., L}. (2.9)

Following this definition, the classifier can only use data of the past and of the present compared to the decision time. Such constraint defines the real-time sequential classi-fication problem.

2.3.3

Delayed

The delayed sequential classification problem is similar to the real-time one, except that it allows a delay between the time at which the decision is output and the time associated to the decision. More precisely, the decision associated to time k is delayed up to a time k + kr:

ˆ

c(k) = h(u(k1, k+ kr)), k + kr≥ k1 and k1,∈ {0, ..., L}, (2.10)

where the classifier h uses the first k + kr time steps of the stream before estimating

the class ˆc(k) associated to the time k.

2.4

Sequential approach for the classification of

streams

Suppose the problem of the classification of a stream u(0, L), of discrete length L + 1, into a one of M set of classes{c1, ..., cM}, such as defined in (2.4). Suppose again that,

for the stream u(0, L), a classifier h is able to return a sequence of Lη = bL/ηc class

(33)

Chapter 2. Supervised temporal pattern recognition 16

to the full time sequence u(0, L), the Lη class estimations should be the same. The

sequential classification approach can be used for the classification of the whole stream u(0, L) by combining the Lη class estimations into a single one. For example, let the

sequential classification of the stream u(0, L) result in five consecutive class outputs: [1 1 2 1 2]. The class sequence contains three class 1 and two class 2. Suppose that the combination uses the majority vote. Since the output class sequence contains more class 1 elements than class 2 elements, the returned class estimate for the stream u(0, L) is the class 1 estimate.

2.5

Performance measures

Depending whether the problem is about classification of streams or sequential classi-fication, measures of performance are not the same. In the case of sequential classifi-cation, the performance measures are usually more complex, has they must compare sequences of class labels. Performance measures for sequential classification often rely on edit distances between strings of characters. Given two sequences of class labels c1 and c2, edit distances are obtained by calculating the minimum number of

opera-tions required to transform c1 into c2. The available operations are deletion, insertion

and substitution [46]. The most commonly known edit distances are the Levenshtein distance [47] and the Hamming distance [48]. The former allows deletions, insertions and substitutions of characters (class labels), while the latter allows only substitutions. Based on edit distances, error rates over complete test sets can be calculated [49].

For the classification of streams, which is the problem studied in this thesis, a single class estimate is returned for each tested data stream. Hence, performance measures found in static classification are applicable. The standard measure is the recognition rate. The recognition rate is simply the ratio of the number of samples correctly clas-sified over the number of tested samples. In the classification of streams, a sample represents a data stream. Another performance measure is the error rate or substitu-tion rate. It is the opposite of the recognisubstitu-tion rate, being the ratio of the number of misclassified streams over the number of tested streams. The recognition rate will be used in the remaining of this thesis.

(34)

Supervised temporal pattern recognition represents the framework of this thesis. Pre-cisely, the problem of supervised classification of streams is studied. In this chapter, static and temporal pattern recognition were compared. For the temporal case, sequen-tial classification and classification of streams were defined as two distinct problems. It was also shown how these two different problems require different measures of perfor-mance.

(35)

Chapter 3

Spiking neural networks

A spiking neural network (SNN) is a model used to represent the processing of informa-tion much like real neural systems. Liquid state machines use spiking neural networks to act as the liquid, where the network processes input data into a high-dimensional representation. This chapter provides an introduction to spiking neural networks. It separates into seven sections. Section3.1 presents basic concepts of computational neu-roscience, from which spiking neural networks are inspired. Section 3.2 reviews neural network models with a particular focus on spiking neural networks. A spiking neuron model is presented in Section3.3. Section3.4 discusses the connectivity of the network. Synapses are introduced in Section 3.5, where a simple model is presented. Section

3.6 concentrates on the plasticity of synapses. Dynamic synapses and spike-timing de-pendent plasticity (STDP) are introduced. Finally, Section 3.7 discusses the biological realism of the spiking neural networks studied in this thesis.

3.1

Basic concepts of computational neuroscience

Computational neuroscience is the study of information processing and representation in the brain [50]. Research in computational neuroscience is driven by the study of biological neural systems. Models of spiking neural networks for engineering applica-tions rest on commonly accepted principles of computational neuroscience. To better understand spiking neural networks models, this section introduces basic concepts of biological neural systems.

(36)

per-axon

synapse

soma soma

dentrites

Figure 3.1: Anatomical model of neuron and its connection to another neuron. A neuron is made of a cell body, the soma, from which dentrites arise. A neuron also has an axon, which forms a long filament that extends the soma. Electrical pulses generated inside the soma are transmitted to other neurons by the axon. The electrical signal is transferred from the axon of the emitting neuron to the dentrite of the receiving neuron by way of the synapse, which maintains the connection between the axon and the dentrite.

forms simple operations, the interconnection of a large number of neurons allows high computational capabilities. A single neuron of the brain can be connected to thousands of neurons [51], enabling complex interactions. As illustrated in Figure 3.1, a neuron is made of a cell body from which arise thin ramifications, the dentrites, and a long extension called the axon. Neurons communicate with each other by emitting electrical pulses, also called action potentials or spikes. Each neuron holds an electric charge around the membrane of the body cell. This charge is called the membrane potential. The membrane potential is the difference between the electrical potential at the mem-brane and the potential inside the body cell. The memmem-brane consists of the contour of the body cell.

When a neuron has not received any electrical pulses for some period of time, it is in a resting state and its electrical charge remains at the resting value. When a neu-ron does receive electrical pulses from other neuneu-rons, its membrane potential changes. When the membrane potential reaches a certain threshold, the neuron emits an elec-trical pulse. The threshold on the membrane potential at which a neuron emits an electrical pulse is often referred to as the firing threshold. The electrical pulses fired by the neuron travel through its axon. The axon represents the output path that con-ducts the electrical signal to other neurons. Each of the receiving neurons is connected to the axon by its dentrites, which receive the electrical pulses that then charge the membrane of the receiving neuron. At the junction between the dentrite and the axon,

(37)

Chapter 3. Spiking neural networks 20

there is the synapse. The synapse modulates and transmits the signal received from the axon up to the dentrite. From the perspective of the synapse that connects the axon of the sending neuron to the dentrite of the receiving neuron, the sending neuron is called presynaptic and the receiving neuron is called postsynaptic. When a pulse reaches the synapse, the potential of the dentrite of the receiving neuron changes. The change in potential, which occurs after some small amount of time and in conformity with each synapse’s properties, may be either an increase or a decrease. A synapse that engenders a decrease is inhibitory while a synapse that produces an increase is excitatory. The resulting variation in potential is called “inhibitory postsynaptic po-tential” (IPSP) or “excitatory postsynaptic popo-tential” (EPSP) depending whether the synapse is inhibitory or excitatory, respectively. Although the intensity of the change in the modulation of potential may vary in time, the type of the synapse (excitatory or inhibitory) always stays the same. The intensity of the change in the modulation of potential is referred to as the synaptic strength or synaptic efficacy. The ability of a synapse to change its synaptic strength is called synaptic plasticity. It depends on the past activity of the synapse. Various types of plasticity exist and they distinguish mainly in respect to the time duration for which synaptic strength is altered [52].

Synaptic plasticity is believed to have a role in learning and memory. Interestingly enough, the relation between synaptic plasticity and learning in the brain is still largely misunderstood [53, 54].

For a neuron to emit a spike, it must have received a given quantity of spikes from other neurons. The impact of spike on the membrane potential is only temporary. For that reason, it usually takes many spikes that are closely spaced in time in order to reach the firing threshold. Figure 3.2plots the membrane potential in terms of time when an action potential, or spike, occurs. As mentioned beforehand, when a neuron receives no spike for a certain time, its membrane potential tends to its resting value. After the production of a spike a neuron needs a recovery time, called refractory period, before it can emit another spike. The recovery time divides in two phases: the absolute refractory period and the relative refractory period. During the absolute refractory period, the membrane voltage is very high and the neuron is inapt for generating another electrical pulse. During the relative refractory period, the membrane voltage falls down below its resting value. As a consequence, only a high increase of the membrane potential could cause the neuron to emit a spike.

As shown in Figure3.2, a neuron operates in a nonlinear fashion. Nonlinearities are an important aspect of neural systems, since they are in part responsible for the high computational capability.

(38)

Membrane voltage time Resting value Firing threshold Firing time Absolute Relative Refractory period

Figure 3.2: Membrane potential in terms of time when a neuron generates an action potential.

3.2

Neural network models

According to [30], neural network models have evolved in three different generations. The first generation is defined by the McCulloch-Pitts model [15]. This model restricts to the application of boolean operations on binary inputs. The second generation demarcates from the first in two ways: adjustable scaling factors (weights) are added on the input and activation functions produce continuous values instead of boolean outputs. The first two generations have evolved through many innovations. The model started with Hebb’s learning rule for a simple neuron [55] and the invention of the perceptron. It then matured with the back-propagation algorithm for multi-layered perceptrons [56, 20]. At this stage, the general model adopted for a neuron with Na

inputs, a1, a2, ..., aNa, was defined as:

b=

Na

X

j=1

(39)

Chapter 3. Spiking neural networks 22

where wj corresponds to the weighting factor on input aj and where Θ is the activation

threshold. The output o of the neuron depends on the activation function f :

o = f (b). (3.2)

The most commonly used activation function f is the sigmoid function: f(b) = 1

1 + e−b. (3.3)

As mentioned in [30], neural network models of the second generation include feed-forward and recurrent sigmoidal neural networks as well as radial basis function units [57, 58]. Such models can approximate relatively well any continuous function on a limited range. For that reason they are said to be “universal for analog computations” [30].

It has long been accepted that most real neurons interact through the transmission of electric pulses or spikes. From that angle, the output of a neuron model from the second generation represents the average rate at which the neuron emits spikes during some period of time. The average rate of spike emission is also called the firing rate. Firing rate had once been the only explanation for how information is processed and encoded in the brain. However, this idea has been challenged since many years [59,29,60,61, 62]. In fact, studies from the neuroscience community revealed that information processing in the brain might highly depend on the time structure of spikes that are emitted from neurons to neurons [63, 64]. The studies suggest that the precise timing of spikes play an important role in encoding information. From an engineering perspective, these findings motivate the use of more biologically plausible neural network models, where information in the network is carried through spikes. Spiking neural networks [61,65,32] correspond to the third generation of neural network models.

Depending on the application at hand, spiking neural networks may not necessarily represent a better solution than the models of the second generation. However, there is both theoretical [13] and experimental evidence [66] that spiking models provide more efficient computing, i.e. less computations are required with spiking models than the rate-based models.

In [67], a simple spiking neural network model is defined. It describes the basic components and functions that form a spiking neural network. The model is presented as a directed graph (Y, E) consisting of a set Y of neurons and a set E of synapses. The set Y includes a subset of input neurons, Yin⊆ Y , and a subset of output neurons

Yout ⊆ Y . For each neuron v ∈ Y − Yin, a firing threshold function Θv : R+ → R ∪ {∞}

(40)

and postsynaptic neurons, are defined, respectively.

The model suggested in [67] also includes a notation for describing the firing times. First, it is assumed that the firing of the input neurons v⊆ Yin is determined from an

external source. For a neuron v ∈ Y − Yin, a set of firing times Fv is defined recursively

in terms of the potential equation Pv:

Pv(t) = 0 + X v−:(v,v)∈E X s∈Fv−:s<t wv,v(s)· µv,v(t− s), (3.4)

where wv−,vand µv,vrepresent the weight and the response function for synapse (v−, v),

respectively. The first element of Fv is the first firing time and is given by inf{t ∈

Υ, Pv(t)≥ Θv(0)}, where t is the firing time and inf represents the infimum equation1.

For any s∈ Fv, the next firing time in Fv is inf{t ∈ Υ, t > s and Pv(t− s) ≥ Θv(t− s)}.

Υ represents the set of potential firing times, with Υ⊆ R+.

3.3

Spiking neuron model

By examination of real neurons, cellular neuroscience has produced detailed models that describe the physiology and behavior of single neurons. In fact, the models reproduce the electrophysiological behaviors of neurons with high accuracy. One of the most commonly known model of this type is the Hodgkin-Huxley model [68], which led the way for the study of new and more detailed models.

From a computational perspective, simpler models are needed to explain how neu-rons react to stimuli. To commit to this goal, models that summarize the essential characteristics of real neurons have been created. Under these models, a neuron is characterized by its spike times and its membrane potential or voltage, expressed in seconds [s] and millivolts [mV], respectively [32, 69]. Figure 3.3 illustrates the spikes and the membrane potential of a spiking neuron over time. Among the simplified neu-ron models, there are the spike response model [70], the resonate-and-fire model [71,69] and the leaky-integrate-and-fire (LIF) model [72,32]. Other models have also been com-pared in [73] as candidate models for the liquid state machine. Nevertheless, the LIF model is by far the most popular of them. It is used in most implementations of spiking neural networks and LSMs. Therefore, it will be discussed in the next paragraphs.

1The infimum inf

{t ∈ Υ, Pv(t)≥ Θv(0)} is the largest real number that is smaller than or equal to every number in{t ∈ Υ, Pv(t)≥ Θv(0)}.

Figure

Figure 1.1: Relation between neuroscience and machine learning.
Figure 2.1: Illustration of a stream. Each point represents a value of the stream.
Figure 3.1: Anatomical model of neuron and its connection to another neuron. A neuron is made of a cell body, the soma, from which dentrites arise
Figure 3.2: Membrane potential in terms of time when a neuron generates an action potential.
+7

Références

Documents relatifs

3) Learning: Once features have been extracted, labeled feature vectors are given as input to a learning algorithm that will produce a prediction model to be used on the

In this structure Levenberg–Marquardt (LM) algorithm and a probabilistic neural network structure employed for computing the better accuracy, for diagnosing the Heart

Instead we found that depending on the species, top height trees have lower mortality rates (trembling aspen, jack pine with medium and high stand basal areas), higher mor- tality

3 Description of the methodology 53 III Experimental approach 55 1 Introduction and objectives 55 2 Materials and methods 57 2.1

The graph illustrates one feature (intensity) represented by their three distributed terms on its definition field called universe of discourse. It is to be noted that the number of

a pool

Experimental results on Berg dataset show that the proposed image feature representation method together with the constructed image classifier, SVM-MK, can achieve

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des