Décodage neuronal dans le système auditif central à l'aide d'un modèle bilinéaire généralisé et de représentations spectro-temporelles bio-inspirées

(1)

(2)

(3)

UNIVERSITÉ DE SHERBROOKE

Faculté de génie

Département de génie électrique et de génie informatique

DÉCODAGE NEURONAL DANS LE

SYSTÈME AUDITIF CENTRAL

À L'AIDE D'UN MODÈLE BILINÉAIRE GÉNÉRALISÉ ET

DE REPRÉSENTATIONS SPECTRO-TEMPORELLES

BIO-INSPIRÉES

Mémoire de maîtrise

Specialité : génie électrique

Shadi SIAHPOUSH

Jury: Eric PLOURDE (directeur)

Jean ROUAT (codirecteur)

Denis GINGRAS (examinateur)

Philippe GOURNAY (examinateur)

(4)

(5)

À ma chére famille et mon amour, Yousof

(6)

(7)

RÉSUMÉ

Dans ce projet, un décodage neuronal bayésien est eectué sur le colliculus inférieur du cochon d'Inde. Premièrement, On lit les potentiels évoqués grâce aux électrodes et ensuite on en déduit les potentiels d'actions à l'aide de technique de classication des décharges des neurones. Ensuite, un modèle linéaire généralisé (GLM) est entraîné en associant un stimulus acoustique en même temps que les mesures de potentiel qui sont eectuées. En-n, nous faisons le décodage neuronal de l'activité des neurones en utilisant une méthode d'estimation statistique par maximum à posteriori an de reconstituer la représentation spectro-temporelle du signal acoustique qui correspond au stimulus acoustique. Dans ce projet, nous étudions l'impact de diérents modèles de codage neuronal ainsi que de dif-férentes représentations spectro-temporelles (qu'elles sont supposé représenter le stimulus acoustique équivalent) sur la précision du décodage bayésien de l'activité neuronale en-registrée par le système auditif central. En fait, le modèle va associer une représentation spectro-temporelle équivalente au stimulus acoustique à partir des mesures faites dans le cerveau. Deux modèles de codage sont comparés: un GLM et un modèle bilinéaire généra-lisé (GBM), chacun avec trois diérentes représentations spectro-temporelles des stimuli d'entrée soit un spectrogramme ainsi que deux représentations bio-inspirées: un banc de ltres gammatones et un spikegramme. Les paramètres des GLM et GBM, soit le champ récepteur spectro-temporel, le ltre post décharge et l'entrée non linéaire (seulement pour le GBM) sont adaptés en utilisant un algorithme d'optimisation par maximum de vrai-semblance (ML). Le rapport signal sur bruit entre la représentation reconstruite et la représentation originale est utilisé pour évaluer le décodage, c'est-à-dire la précision de la reconstruction. Nous montrons expérimentalement que la précision de la reconstruction est meilleure avec une représentation par spikegramme qu'avec une représentation par spectrogramme et, en outre, que l'utilisation d'un GBM au lieu d'un GLM augmente la précision de la reconstruction. En fait, nos résultats montrent que le rapport signal à bruit de la reconstruction d'un spikegramme avec le modèle GBM est supérieur de 3.3 dB au rapport signal à bruit de la reconstruction d'un spectrogramme avec le modèle GLM. Mots-clés : Décodage neuronal bayésien, codage neuronal, colliculus inférieur,

appren-tissage de maximum de vraisemblance, modèle linéaire généralisé, modèle bilinéaire généralisé, spikegram, banc de ltres gammatones

(8)

(9)

ABSTRACT

In this project, Bayesian neural decoding is performed on the neural activity recorded from the inferior colliculus of the guinea pig following the presentation of a vocalization. In par-ticular, we study the impact of dierent encoding models on the accuracy of reconstruction of dierent spectro-temporal representations of the input stimulus. First voltages recorded from the inferior colliculus of the guinea pig are read and the spike trains are obtained. Then, we t an encoding model to the stimulus and associated spike trains. Finally, we do neural decoding on the pairs of stimuli and neural activities using the maximum a pos-teriori optimization method to obtain the reconstructed spectro-temporal representation of the signal. Two encoding models, a generalized linear model (GLM) and a generalized bilinear model (GBM), are compared along with three dierent spectro-temporal repre-sentations of the input stimuli: a spectrogram and two bio-inspired reprerepre-sentations, i.e. a gammatone lter bank (GFB) and a spikegram. The parameters of the GLM and GBM in-cluding spectro-temporal receptive eld, post spike lter and input non linearity (only for the GBM) are tted using the maximum likelihood optimization (ML) algorithm. Signal to noise ratios between the reconstructed and original representations are used to evaluate the decoding, or reconstruction accuracy. We experimentally show that the reconstruc-tion accuracy is better with the spikegram representareconstruc-tion than with the spectrogram and GFB representation. Furthermore, using a GBM instead of a GLM signicantly increases the reconstruction accuracy. In fact, our results show that the spikegram reconstruction accuracy with a GBM tting yields an SNR that is 3.3 dB better than when using the standard decoding approach of reconstructing a spectrogram with GLM tting.

Keywords: Bayesian neural decoding, Neural encoding, Inferior colliculus, Maximum likelihood tting, Generalized linear model, Generalized bilinear model, Spikegram representation, Gammatone lter bank

(10)

(11)

ACKNOWLEDGEMENTS

First and foremost, I would like to thank my supervisor, Eric Plourde and my co-supervisor Jean Rouat for all the support and encouragement they gave me. I appreciate all their contributions of time, ideas, lectures and funding during the course of this thesis.

I would like also to thank Denis Gingras and Philippe Gournay who accepted to be com-mittee members of my thesis and for their constructive comments.

Many thanks to Thilo Rode from SONIC lab of University of Minnesota for providing the neural recording used in this project.

Special thanks to Hossein Najaf-Zadeh and Ramin Pichevar for providing perceptual matching pursuit code as well as Jonathan Pillow from Princeton University for the GLM toolbox and neural decoding code.

I also thank the members of NECOTIS laboratory that have been a source of good advice and collaboration. I have had the pleasure to work with or alongside of them.

I thank my amazing family, who have cherished with me every great moment and sup-ported me whenever I needed it. Last but not least, I am grateful to my loving, supportive, encouraging, and patient husband, Yousof, for his faithful support throughout my studies. Thank you!

(12)

(13)

LIST OF FIGURES

2.1 The peripheral auditory system including outer, middle and inner ears. The outer ear comprises the pinna and the auditory canal. The middle ear comprises the tympanic membrane, a series of bones called the malleus, incus and stapes. The inner ear comprises the cochlea which has three parts: scala tympani, scala media and scala vestibuli [Reinerio, 2013]. . . . 6 2.2 The organ of corti sits on the basilar membrane. It includes the outer and

inner hair cells [Wikimedia, 2015]. . . 7 2.3 Auditory Pathway [Heeger, 2013]. . . 8 2.4 The position of the IC in the brain [Commons, 2015]. It is located in the

auditory midbrain. . . 9 2.5 The IC structure consists of three parts: a central nucleus (ICC), a dorsal

cortex and an external cortex (paracentral nuclei) [Heeger, 2013]. . . 10 2.6 The connections between IC and MGB [Heeger, 2013]. . . 11 2.7 A sample spectro-temporal receptive eld (STRF) learned for an auditory

neuron in the inferior colliculus of guinea pig using the maximum likelihood tting approach used in this thesis. Each frequency bin is 1 kHz and each time bin is 1 ms. . . 12 2.8 A sample spectro-temporal receptive eld (STRF) learned for an auditory

neuron in the inferior colliculus of guinea pig. Each frequency bin is 1 kHz and each time bin is 1 ms. . . 12 2.9 Neural encoding and decoding of spike activities. In neural encoding, the

stimulus is available and the neural activity are estimated, while in neural decoding, neural activity are present and the stimulus should be estimated. 13 2.10 A GLM for a single neuron in the auditory system. It comprises four main

blocks, 1. a linear, 2. a non linear, 3. a point process block and 4. a post spiking lter block from left to right respectively [Calabrese et al., 2011]. In K*s, the "*" is the convolution symbol between STRF and spectro-temporal representation in respect with time. . . 19 2.11 A GBM for a single neuron. It comprises: a spectro-temporal transform

of the input signal, an input nonlinearity, a linear STRF, an output non-linearity, a point process and a post spiking lter block from left to right respectively [Ahrens et al., 2008]. . . 20 2.12 A group of piecewise linear basis, gk, k = 1, .., 10. Each piecewise linear

base gk is a triangle function centered at the input value mi [Vidaurre

et al., 2011]. . . 22 2.13 Example of input nonlinearities of 3 neurons (recording sites) using the

maximum likelihood tting method for the dataset of this project. By increasing the number of piecewise linear bases, these input nonlinearity functions become smoother [Ahrens et al., 2008]. . . 23 2.14 A sample spectrogram (lower plot) for a sample sound (upper plot). . . 26

(16)

2.15 A sample gammatone represented in the time domain. It is the result of the multiplication of a gamma function and a sinus function [Slaney, 1993]. A set of these gammatones with dierent center frequencies can be used as the gammatone lterbank for spikegram representation. . . 27 2.16 A sample Gammatone lter bank represented in the frequency domain. The

gammatone center frequencies cover the range between 20 Hz and 20 kHz. [Slaney, 1993]. This set of gammatone kernels can be used for spikegram representation of audio stimuli. . . 28 2.17 The masking eect performed in the perceptual matching pursuit. If the

sensation level SL(k, n) of a gammatone kernel with amplitude A(n) in the critical band k is higher than its associated masking threshold Mk(n), then

the gammatone is called audible [Pichevar et al., 2011]. Note that based on (2.14), for higher A(n) the sensation level SL(k, n) is higher. . . 30 3.1 a) A 2D table including the spectrogram values of a sound stimulus, in the

dataset with T time bins and F frequency bins, sij indicates the value of

the spectro temporal bin. b) A 2D table, in which each element includes a histogram (pdf) associated to each time frequency bin p(sij). The

his-togram p(sij) is computed using the spectrogram of all sound stimuli in

the dataset, with the same number of time- frequency bins. A, B, C de-notes three columns of spectrally correlated time-frequency bins. c) tting the histograms associated to one time-frequency bin, p(sij) to the nearest

Gaussian distribution using maximum likelihood tting. . . 34 3.2 Electrodes number 1 to 16 are installed in one shank parallel to electrodes

number 17 to 32 in another shank. . . 39 3.3 Dierent stimuli used in this project. In each gure, the upper plot is the

time domain stimuli and the lower plot is the spectrogram of the stim-uli. The stimuli include "voiced" vocalisations from guinea pigs as well as transients. . . 40 3.4 Best frequencies for the dierent electrodes of the two shanks numbered 1

to 32. As is seen, the best frequencies for recording sites 1-16, positioned on the rst shank, are close to the best frequencies for recording sites 17-32, positioned on the second shank, the two shanks being parallel. . . 41 4.1 Spiking and limit thresholds applied to neural activity recordings. By low

pass ltering, the low frequency noise is removed from neural voltages and the voltages have zero mean. For a peak to be considered as a spike, the module of the value of the peak should be between the spiking threshold (3δ) and the limit threshold 10δ, where δ is the standard deviation of all neural activities in the dataset. All these operations are done on the neural recordings before extracting neural spikes. . . 44

(17)

LIST OF FIGURES xi 4.2 (a): a time domain stimulus, (b): superposed spike responses when

stim-ulus has 40 dB of sound pressure level (SPL). (c): the post-stimstim-ulus time histogram (PSTH) (the histogram of times at which spikes occurred) for channel number 21 and (d): the raster plot of channel number 21. A dot in the raster plot shows that for a specic trial number (vertical axis) and time (horizontal axis), we have a spike registered from channel 21. . . 46 4.3 (a): a time domain stimulus, (b): superposed spike responses to 40 dB SPL,

(c): the post-stimulus time histogram (the histogram of times at which spikes occurred) for channel number 3 and (d): the raster plot of channel number 3. As is seen compared to Fig. 4.2, here, the time of spikes are more synchronized with the spectro-temporal content and silence parts of the stimulus. . . 47 4.4 Initialization of the GLM parameters. (a) is the stimulus lter (temporal

receptive eld) of a neuron, (b) the post-spike kernel obtained from a linear combination of basis in (d), (c) is the exponential post spike lter, and (d) are the basis for creating the post spike lter. . . 48 4.5 (a): 1D Gaussian white noise stimulus, (b): The blue wave indicates the

neural activity obtained using a Gaussian white noise input stimulus along with the initial GLM parameters. The red points indicate the detected spikes, (c): the dierence between the output of the STRF block and post spike current in the GLM block diagram, (d): The neural encoding results when repeating the input stimulus in (a) for several times. . . 49 4.6 The results of the GLM tting for a 1D stimulus with a length 500 samples.

The curves in (a) indicate the stimulus lter, in (b) indicate the post spike lter in (c) indicate the normalized stimulus lter and in (d) indicate the post spike lter. In all parts, the blue one is the initial GLM parameters and the red one is the tted GLM parameters. . . 50 4.7 The results of the GLM tting for a 1D stimulus with a length 5000 samples.

The curves in (a) indicate the stimulus lter, in (b) indicate the post spike lter in (c) indicate the normalized stimulus lter and in (d) indicate the post spike lter. In all parts, the blue one is the initial GLM parameters and the red one is the tted GLM parameters. As is seen, compared to Fig.4.6, when increasing the length of the stimulus, the initial and tted GLM parameters become closer to each other. . . 51 4.8 Part (b) is an arbitrary initial STRF which covers a specic spectro-temporal

range. This STRF can be imagined as the multiplication of the temporal lter of part (a) and the spectral lter of part (c). Part (d) is the initial post spike lter and part (e) is a temporal stimulus from it we calculate its 2D spectro-temporal representation. . . 52 4.9 (a): Spectrogram of the real stimulus, (b): the green points are spike times

and the blue ones are net voltages, (c): The stimulus (blue) and the spike times (green dots) and spike induced current(red) which is the dierence between the output of the STRF block and the post spike current in the GLM block diagram. (d): Membrane voltages when the same stimulus in (a) is repeated several times. . . 53

(18)

4.10 (a): a default (true) STRF used for generating spike trains, (b): the es-timated post spike lters obtained using the GLM (blue) and GBM (red) encoding models, (c): the STRF lter obtained using the Maximum Like-lihood (ML) estimation approach and the GLM, (d): the STRF obtained using the GBM. As is seen the estimated STRF and postspike lter using GBM are much closer to the default ones than when using GLM. . . 54 4.11 The results of the tting using the GLM and GBM encoding models on real

spike trains and stimulus for recording sites number 21 and 31. . . 55 4.12 Spectrogram reconstruction for one stimulus where (a) is the input

spectro-gram and (b) and (c) are respectively the decoded stimulus representation with a GLM and a GBM when using the neural activity from all 32 record-ing sites. Time and frequency bins equal respectively 1 ms and 1.6 kHz. . . 57 4.13 GFB reconstruction for one stimulus: (a): the input GFB. (b),(c):respectively

the decoded stimulus representation with a GLM and a GBM when using the neural activity from all 32 recording sites. Each time bin equals 1 ms and a 15-channel gammatone lter bank is used. . . 59 4.14 Spikegram reconstruction for one stimulus: (a): the input spikegram. (b),(c):

respectively the decoded stimulus representation with a GLM and a GBM when using the neural activity from all 32 recording sites. Each time bin equals 1 ms and a 15-channel gammatone lter bank is used. . . 60 4.15 Mean SNR vs. the number of electrodes (recording sites or neurons)

con-sidered in the reconstruction for dierent spectro-temporal representations (spectrogram, GFB and spikegram) using GLM (star line) and GBM (solid line). In all experiments, 6 stimuli, 20 trials for each stimulus, are used. The ithelectrode on the horizental axis, denotes that all the neural activities

recorded from electrodes 1 to ith _{are used to calculate the SNR. . . 61}

A.1 Time rescaling method with renewal process. Upper plot: time axis of the poisson process before time rescaling, Lower plot: time axis of the poisson process after time rescaling [Pillow, 2009]. Photo courtesy Pillow Lab. . . . 71 A.2 Time rescaling method [Pillow, 2009]. A sample renewal density q is shown

in the left part. In the right part, the CIF of the ring rate Λ(t) is obtained from the average spiking rate λ(t) and is sampled using the renewal process to generate the timing of the rings (neuron spikes). Photo courtesy Pillow Lab. . . 72 A.3 A sample for time rescaling method with renewal process [Pillow, 2009].

When the average ring is high the number of ring per time increases. Photo courtesy Pillow Lab. . . 73 A.4 Spectrogram reconstruction for one stimulus where upper plot is the input

spectrogram and lower plot is the decoded stimulus representation with using a GBM when using the neural activity of one electrode. Time and frequency bins equal respectively 1 ms and 1.6 kHz. . . 75

(19)

LIST OF FIGURES xiii A.5 Spectrogram reconstruction for one stimulus where upper plot is the input

spectrogram and lower plot is the decoded stimulus representation with using a GBM when using the neural activity of electrodes No. 1-4. Time and frequency bins equal respectively 1 ms and 1.6 kHz. . . 76 A.6 Spectrogram reconstruction for one stimulus where upper plot is the input

spectrogram and lower plot is the decoded stimulus representation with using a GBM when using the neural activity of electrodes No. 1-8. Time and frequency bins equal respectively 1 ms and 1.6 kHz. . . 77 A.7 Spectrogram reconstruction for one stimulus where upper plot is the input

spectrogram and lower plot is the decoded stimulus representation with using a GBM when using the neural activity of electrodes No. 1-16. Time and frequency bins equal respectively 1 ms and 1.6 kHz. . . 78 A.8 GFB reconstruction for one stimulus: upper part: the input GFB. Lower

part: the decoded stimulus representation with a GBM when using the neural activity of one electrode. Each time bin equals 1 ms and a 15-channel gammatone lter bank is used. . . 79 A.9 GFB reconstruction for one stimulus: upper part: the input GFB. Lower

part: the decoded stimulus representation with a GBM when using the neural activity of electrodes No. 1-4. Each time bin equals 1 ms and a 15-channel gammatone lter bank is used. . . 80 A.10 GFB reconstruction for one stimulus: upper part: the input GFB. Lower

part: the decoded stimulus representation with a GBM when using the neural activity of electrodes No. 1-8. Each time bin equals 1 ms and a 15-channel gammatone lter bank is used. . . 81 A.11 GFB reconstruction for one stimulus: upper part: the input GFB. Lower

part: the decoded stimulus representation with a GBM when using the neural activity of electrodes No. 1-16. Each time bin equals 1 ms and a 15-channel gammatone lter bank is used. . . 82 A.12 Spikegram reconstruction for one stimulus: upper part: the input spikegram.

Lower part: the decoded stimulus representation with a GBM when using the neural activity of one electrode. Each time bin equals 1 ms and a 15-channel gammatone lter bank is used. . . 83 A.13 Spikegram reconstruction for one stimulus: upper part: the input spikegram.

Lower part: the decoded stimulus representation with a GBM when using the neural activity of electrodes No. 1-4. Each time bin equals 1 ms and a 15-channel gammatone lter bank is used. . . 84 A.14 Spikegram reconstruction for one stimulus: upper part: the input spikegram.

Lower part: the decoded stimulus representation with a GBM when using the neural activity of electrodes No. 1-8. Each time bin equals 1 ms and a 15-channel gammatone lter bank is used. . . 85 A.15 Spikegram reconstruction for one stimulus: upper part: the input spikegram.

Lower part: the decoded stimulus representation with a GBM when using the neural activity of electrodes No. 1-16. Each time bin equals 1 ms and a 15-channel gammatone lter bank is used. . . 86

(20)

(21)

LIST OF TABLES

4.1 The results are given as the mean ± one half of the standard deviation, both estimated on the 20 trials per stimulus. . . 56

(22)

(23)

LIST OF ACRONYMS

Acronyme Dénition

ABI Auditory brainstem implant

AMI Auditory midbrain implant

CI Cochlear implant

CNC Cochlear nucleus complex

IC Inferior colliculus

ICC Inferior colliculus central nucleus SOC Superior olivary complex

LL Lateral lemniscus

GFB Gammatone lter bank

GBM Generalized bilinear model GLM Generalized linear model

GWN Gaussian white noise

STFT Short time Fourier transform

MAP Maximum a posteriori

MGB Medial geniculate body nucleus

ML Maximum likelihood

MSE Mean squared error

STRF Spectro temporal receptive led

S-T Spectro-temporal

SNR Signal to noise ratio

T-F Time-frequency

PMP Perceptual Matching Pursuit

(24)

(25)

CHAPTER 1 INTRODUCTION

The hearing sense is important since by hearing, we obtain information about the environ-ment that we live in. Hence the hearing sense makes our interaction with the environenviron-ment easier. Deafness occurs because of the malfunction of dierent structures in the auditory pathway. For humans without the hearing ability, implanting a hearing prosthesis can be vital. There are dierent types of auditory implants based on the location of the prosthesis including: cochlear implants (CI) and implants for the central auditory system, includ-ing auditory brainstem implants (ABI) and auditory midbrain implants (AMI). The AMI is implanted in the inferior colliculus, a structure in the midbrain. Understanding the neural representation of an auditory stimulus as well as the decoding mechanisms of the mammalian brain is essential to improve the performance of current auditory implants. One way to improve our knowledge of these aspects is to design and analyze neural de-coding algorithms. In neural dede-coding, the goal is to reconstruct the sound stimulus from the neural signals. Because of the stochastic nature of the neural representations, the Bayesian approach has been widely used for neural decoding [Doya et al., 2011; Pillow et al., 2011]. For example, in [Ramirez et al., 2011], Ramirez et al. used a Bayesian approach to reconstruct the spectrogram of zebra nch stimuli using the neural activity recorded from auditory midbrain neurons. In this approach, the instantaneous ring rate of a neuron in the auditory midbrain of the zebra nch is obtained through a generalized linear model (GLM) that considers the neuron's past rings as well as a spectro-temporal representation of the input stimulus.

1.1 Research objective

The inferior colliculus (IC) is an important module of the auditory pathway in decoding, combining and transmitting neural activities. While it is a suitable station for auditory implants [Goldwyn et al., 2012], the coding and decoding of information in the inferior colliculus is not yet clearly understood. One approach to improve our understanding of the encoding process in the inferior colliculus is to perform neural decoding on action potentials recorded from this important region of the auditory system.

(26)

In this project, the general objective is to understand the neural representation of an auditory stimulus as well as the decoding mechanisms using neural activity recorded from the mammalian brain. Because of the stochastic nature of the neural representations, a Bayesian approach will be used to perform neural decoding on the spikes recorded from the inferior colliculus of the guinea pig. More specically we rst want to compare the reconstruction of dierent spectro-temporal representations of audio stimuli from the recorded neural activities. Secondly, we also want to examine the eciency of a new encoding model for auditory neurons called generalized bilinear model (GBM) [Ahrens et al., 2008] in a neural decoding setting.

1.2 Main contributions

In this thesis, we bring two dierent contributions to the eld of neural decoding. First, we compare the reconstruction of three dierent spectro-temporal representations called spectrogram, gammatone lterbank (GFB) [Slaney, 1993] and spikegram [Pichevar et al., 2011]. We evaluate the dierence in the reconstruction accuracy of these representations when using a neural decoding algorithm. We experimentally show that the reconstruction accuracy of the spikegram is better than the reconstruction accuracy of the GFB and that the reconstruction accuracy of the GFB is higher than that of the spectrogram. This is reasonable because compared to the spectrogram, the spikegram and GFB are two bio-inspired and auditory based representations [Najaf-Zadeh et al., 2008; Smith and Lewicki, 2005]. As a second contribution, we study the use of a generalized bilinear model (GBM) [Ahrens et al., 2008] for the decoding of neural activity in the auditory system. We show that the spikegram reconstruction accuracy with a GBM tting is 3.3 dB higher than the spectrogram reconstruction accuracy with GLM tting. As a result of this project, a conference paper is presented at the 37th annual international conference of the IEEE Engineering in Medicine and Biology Society (EMBC2015) [Siahpoush et al., 2015]. The thesis is organized as follows. In chapter 2, the state of the art is presented. It includes a brief review of the auditory system and more specically of the inferior colliculus, as well as a study on stimulus spectro-temporal reconstruction using the Bayesian approach, a study on dierent spectro-temporal representations and nally a review of dierent neural decoding studies. In chapter 3, the methodology of the project is presented, which includes: the methodology for the reconstruction of dierent spectro-temporal representations using maximum a posteriori (MAP) estimator, a description of the available data and nally the evaluation measures for the quality of the spectro-temporal reconstruction. In chapter 4,

(27)

1.2. MAIN CONTRIBUTIONS 3 the results are presented, including: the experimental setup, the validation of the neural encoding as well as the comparison of the reconstruction accuracy for both the dierent spectro-temporal representations and the neural encoding models (GLM and GBM) based on the SNRs. Chapter 5 concludes the thesis.

(28)

(29)

CHAPTER 2 STATE OF THE ART

In this chapter, the state of the art of the thesis is presented. It is organized as follows. In section 2.1, a brief review of the auditory system is presented including the peripheral and the central auditory systems as well as a description of the spectro-temporal receptive eld for IC neurons. In section 2.2, neural decoding of spikes is explained including neural spike train acquisition, single unit and multi-unit recordings as well as the study of the general Bayesian approach, also describing the generalized linear model and the general-ized bilinear model. Next, a review of neural encoding studies and the spectro-temporal reconstruction of the stimulus using the Bayesian approach is described. In section 2.3, dif-ferent spectro-temporal representations are described including spectrogram, gammatone lter bank and spikegram representations.

2.1 The auditory system

2.1.1 The peripheral auditory system

Dierent parts of the auditory system contribute to the perception of audio signals which results in hearing. The sound is captured by the external ear, then the acoustical pressure is converted into spikes in the cochlea. These spikes are transferred via auditory neurons to the brain where it can be interpreted and analyzed. The ear comprises three structures: the outer, the middle and the inner ear (Fig.2.1) [Thomas, 2013].

The outer ear is the external part of the ear, which consists of the pinna and the ear canal. The visible part of the ear is called pinna. One important task of the pinna is to attenuate the sound in a frequency selective manner. This task contributes to the localization of a sound. Then, these sound waves are received by the auditory canal (see Fig.2.1). At the end of the ear canal, these sounds stimulate the tympanic membrane (eardrum). It is the rst part of the middle ear.

The eardrum is connected to a series of bones called the malleus, incus and stapes. These ossicles are jointed to a membrane smaller than the eardrum which is called the oval

(30)

Figure 2.1 The peripheral auditory system including outer, middle and inner ears. The outer ear comprises the pinna and the auditory canal. The middle ear comprises the tympanic membrane, a series of bones called the malleus, incus and stapes. The inner ear comprises the cochlea which has three parts: scala tympani, scala media and scala vestibuli [Reinerio, 2013].

window. Therefore, when wave sounds hit the eardrum, these ossicles are stimulated, and they convert the low-pressure wave of the eardrum into higher-pressure wave which are transmitted to the oval window. There are two windows in the cochlea. The rst window is the oval window and the second is called the round window (Fig.2.1) [Thomas, 2013]. The inner ear corresponds to two dierent sensory organs: the balance organ, and the hearing organ. The cochlea is the essential module for hearing in the inner ear. It has snail-shape and comprises three modules: scala tympani, scala media and scala vestibuli. The basilar membrane is located between the scala tympani and the scala media. The ionic dierences between the two uids in scala media and scala tympani is important for the function of the inner ear.

(31)

2.1. THE AUDITORY SYSTEM 7 Another important region of the peripheral auditory system is the organ of corti. As is seen from Fig.2.2, the organ of corti is on the surface of the basilar membrane. The spectral content of the input generates stimululations at dierent locations of the basiliar membrane. The high frequency components of the sound generate vibrations at the base of the cochlea and the low frequencies generate vibrations at the apex. Thus the basilar membrane has a tonotopic organization, which means that a specic place of the basilar membrane responds to a specic frequency content of the input stimulus. There is a logarithmic relationship between the frequency of the sound and the position of the basilar membrane that vibrates.

Figure 2.2 The organ of corti sits on the basilar membrane. It includes the outer and inner hair cells [Wikimedia, 2015].

The organ of corti comprises many hair cells. Based on the functionality, there are two categories of hair cells: outer and inner hair cells. The outer hair cells are transducers which are frequency selective in amplifying the sound vibrations. The outer hair cells re-ceive neural input from the brain and play a role in the cochlea's mechanical pre-amplier. Inner hair cells are like transducers which convert the vibration of sound (transmitted by the pressure dierence between the uids inside the scala media and the scala tympani) into electrical activity in nerve bers, to be transmitted to the brain [Rees and Palmer,

(32)

2010]. Two types of cochlear neurons lie in the spiral ganglion section of the cochlea and are called type I and type II. These two types are linked to the hair cells. Most cochlear neurons are of type I and innervate the inner hair cells. Type I neurons are bipolar, myelinated and have a large diameter. In contrast, type II neurons are monopolar, not myelinated, have a small diameter and innervate the outer hair cells [Rees and Palmer, 2010].

2.1.2 The central auditory system

The axons of the auditory nerve terminate in the cochlear nucleus where the information received from the inner ear is processed. As is seen in Fig. 2.3, the superior olivary complex (SOC) is the next relay after the cochlear nucleus.

(33)

2.1. THE AUDITORY SYSTEM 9 It is a region of the brainstem nuclei and has impact on the functioning of the ascending and descending auditory pathways. The SOC is the rst major region in the auditory system where information from left and right ears converge [Rees and Palmer, 2010]. The information is then transmitted to the lateral lemniscus (LL) (see Fig. 2.3).

The LL is a tract of axons in the brainstem, its function is to relay sound information from the cochlear nucleus to other parts of the brainstem and it is also associated with the motor system for fast motor responses based on the sound localization. The majority of the auditory bers that arrive in the LL, converge in the central nucleus of the ICC, a region of the inferior colliculus (IC). The information is then transmitted to the medial geniculate body nucleus MGB.

The MGB is a region of the auditory thalamus and is the thalamic relay between the IC and the auditory cortex. The MGB is the nal subcortical station (see Fig. 2.3). The MGB is composed of ventral, dorsal and medial divisions. The ventral division of the MGB receives auditory cues from the central nucleus of the IC. We describe the IC in the next section since the neural activity considered in this study has been recorded from the IC.

2.1.3 The inferior colliculus

The inferior colliculus (IC) is an important auditory nucleus which is located in the midbrain. It integrates essential sound information.

Figure 2.4 The position of the IC in the brain [Commons, 2015]. It is located in the auditory midbrain.

(34)

The majority of the aerent bers of the central auditory system converge to the inferior colliculus. Hence, neural activity can be recorded via electrodes positioned in the IC [Ehret and Romand, 1997]. The position of the IC is deep inside the auditory midbrain and is shown in Fig.2.4.

IC structure

The inferior colliculus comprises three areas: a central nucleus (ICC), a dorsal cortex and an external cortex (paracentral nuclei) (Fig.2.5).

Figure 2.5 The IC structure consists of three parts: a central nucleus (ICC), a dorsal cortex and an external cortex (paracentral nuclei) [Heeger, 2013].

ICC is the center of IC, it is an important station for aerent auditory information in the IC. The dorsal cortex surrounds the ICC and is the important pathway for eerent auditory information in the IC. The neurons of the paracentral nuclei receive projection from the somatosensory nuclei. As the IC receives information about the horizontal di-rection of a sound source, it is considered as the rst place in the brain that has enough information to localize a sound source [Winer and Schreiner, 2005].

(35)

2.1. THE AUDITORY SYSTEM 11 The inputs and outputs of IC

Input and output connections to the IC, for the aerent auditory pathway, are shown in Fig.2.3. The inputs of the IC mostly come from the brainstem nuclei while the outputs of the IC projects to the medial geniculate body nucleus (MGB) (Fig.2.6), the SOC and the cochlear nucleus complex (CNC). The ICC projects on the ventral division part of the MGB and preserves the tonotopical organization (see Fig.2.6).

Figure 2.6 The connections between IC and MGB [Heeger, 2013]. IC Function

The inferior colliculus is responsible for unifying sound location data obtained from binau-ral hearing . It is also responsible for nding frequencies of specic amplitude modulation which can be used for dierent signal analysis of the brain such as pitch detection. In ad-dition to the ascending bers of the brainstem that converge in the IC, there are multiple descending pathways that can change the properties of the sound coding in the IC. The IC also has a laminar structure where each lamina processes a specic frequency of the

(36)

input stimulus. Hence this laminar structure of the IC contributes to the pitch detection in the auditory brain. Studies show that laminas of the IC contain a good representation of spectral and temporal parameters [Palombi and Caspary, 1996].

Spectro-temporal receptive eld (STRF) of IC neurons

The ring response of an auditory neuron is modied according to the spectral and tem-poral characteristics of the input stimulus.

Figure 2.7 A sample spectro-temporal receptive eld (STRF) learned for an auditory neuron in the inferior colliculus of guinea pig using the maximum like-lihood tting approach used in this thesis. Each frequency bin is 1 kHz and each time bin is 1 ms.

Figure 2.8 A sample spectro-temporal receptive eld (STRF) learned for an auditory neuron in the inferior colliculus of guinea pig. Each frequency bin is 1 kHz and each time bin is 1 ms.

(37)

2.2. NEURAL DECODING 13 This property is associated with the role of the receptive eld of the neuron. The spectro-temporal receptive eld (STRF) of an auditory neuron shows the ring of a neuron for preferred frequencies of the input stimulus across time. In fact, the STRF is a two dimen-sional input lter for the auditory neuron. Two sample STRF for two dierent neurons in the inferior colliculus are shown in Fig.2.7 and Fig.2.8.

2.2 Neural decoding

It is fundamental to understand how the auditory system encodes sound stimulations into spikes (neural encoding) and how the neural activity characterizes the sounds (neural de-coding, see Fig. 2.9) . Indeed, the knowledge about neural encoding and decoding will help researchers to better understand the auditory brain and use it to better design and improve auditory applications such as auditory implants as well as speech recognition, source separation and source localization algorithms.

Figure 2.9 Neural encoding and decoding of spike activities. In neural encod-ing, the stimulus is available and the neural activity are estimated, while in neural decoding, neural activity are present and the stimulus should be esti-mated.

2.2.1 Neural spike train acquisition

It is dicult to record neural activity in the central auditory system of humans. Therefore animals with similar auditory systems are used. In this project, because of the similarity of the auditory systems of guinea pigs and humans, neural recording is done from the

(38)

inferior colliculus of guinea pigs.

Single-unit and multi-unit recording

Single-unit and multi-unit recordings can be used to record the neural activity. In single-unit recording, direct observation and recording of the electrical activity of a single cell is achieved. However, in multi-unit recording the electrode might record the activity gen-erated by several neurons simultaneously [Vertes et al., 2011]. For the goal of neural decoding in this research, multi-unit neural activity recording have been used which are from the SONIC LAB of the University of Minnesota. The details are given in chapter 3.2 and in [Rode et al., 2013]

Guinea pig as a suitable animal for auditory studies

Animals are used as a substitution for humans in research studies involving neural record-ings in the auditory system. The auditory system of the guinea pig has similarities with that of the human auditory system, however, the hearing range of the guinea pig is higher than that of the humans [Popelar and Syka, 1982]. The guinea pig is accordingly a pre-ferred animal model for studying the auditory systems. The rst successful attempt to regenerate hair cells in the inner ear of a mammal, announced in 2003, was achieved in guinea pigs [Kawamoto et al., 2003]. In another example, sinusoidal amplitude modulated sounds were presented to the anesthetized guinea pigs and the neural spikes were recorded from the ventral cochlear nucleus (VCN). It was shown that amplitude modulation (AM) detection thresholds for the single unit recording in the VCN of guinea pigs are similar to ones obtained in humans [Sayles et al., 2013].

2.2.2 General Bayesian approach

One common approach for neural decoding is Bayesian estimation [Doya et al., 2011]. In [Ramirez et al., 2011], Bayesian estimation has been used to reconstruct the time-frequency representation of the input sound stimulus s(τ, f) from spike activities n which are recorded from the eld L of the forebrain area of the zebra nch. The function of the eld L of the forebrain of the zebra nch is similar to the function of the inferior colliculus in mammals including guinea pig. In both of them, the tonotopic representations of spectro-temporal auditory cues are conrmed [Woolley et al., 2005]. As there is no linear

(39)

2.2. NEURAL DECODING 15 and unique mathematical relation between spike trains and input signals, the Bayesian rule has proven to be ecient to derive decoding algorithms [Doya et al., 2011].

For simplicity, in the rest of the thesis, the concise form s is used to show the spectro-temporal representation of the input stimulus. The goal of a Bayesian estimator is to estimate a random variable s based on a set of observations n, that are statistically re-lated to s. The Bayesian rule is:

p(s | n) = p(n | s)p(s)

p(n) (2.1)

In which, the posterior probability p(s | n), is written based on the likelihood p(n | s) (for example, the probability of the observed spike trains given stimulus), prior probability p(s) and marginal probability p(n). In other words, we want to estimate the optimal s based on the observations n. To infer the variable s, there are several possibilities, includ-ing maximum likelihood (ML) and maximum a posteriori (MAP) estimation. In ML, the likelihood probability can be maximized as follows:

sM L =arg maxs{p(n | s)} ∀s (2.2)

The ML estimation does not consider any prior knowledge about s, while, in MAP, the posterior probability is maximized which means that the prior probability about s is con-sidered. In MAP the posterior probability can be maximized via the equations below:

sM AP =arg maxs{p(s | n)} =arg maxs{p(n | s)p(s)/p(n)} (2.3)

Since the logarithm is a monotonically increasing function, and does not change the ex-tremums of the posterior, for simplicity of the calculation, we can maximize the logarithm

(40)

of the posterior as below:

sM AP =arg maxs{log{p(s | n)}} (2.4)

sM AP =arg maxs{log{p(n | s)} + log{p(s)} − log{p(n)}} (2.5)

where s is the hidden variable to be estimated. To infer the variable s given the stimulus, we should nd s that maximizes the posterior. This is done by computing the derivative of the logarithm of the posterior with respect to s. Since the maximization is performed with respect to s, all terms in n are considered as constants. Therefore, the marginalization term p(n) does not have direct eect on the result of the posterior maximization. This term is thus neglected when maximizing the posterior. Then an optimization algorithm such as gradient descent can be used to nd the optimal latent variable s [Doya et al., 2011; Pillow et al., 2011]. For using gradient descent, we should be sure that log{p(n | s)} is concave. It is shown, that GLM has a concave log-likelihood function if the GLM nonlin-earity function is itself convex. We use exp as non linnonlin-earity function for the GLM therefor, we are sure that log of likelihood is convex [Paninski, 2004].

Bayesian Decoding

The objective of the decoding problem is to estimate the input stimulus from the observed spike train. By using Bayesian estimation, the decoding problem is transformed into an encoding one which is easier to solve. In fact, Bayesian decoding through MAP estimation is performed by nding the appropriate s that maximizes the left part of (2.1). As can be seen, instead of maximizing directly p(s | n), which is a dicult task, on can alternatively evaluate the likelihood p(n | s) and the prior p(s). Finding the likelihood p(n | s) is equiv-alent to performing neural encoding which means to nd the spike trains given the input stimulus. Therefore, the Bayesian decoding problem consists of two tasks: the encoding problem and nding the prior.

(41)

2.2. NEURAL DECODING 17 To estimate the likelihood, it is assumed that the activity of neurons are conditionally independent given the stimulus, since the recording from cells have been done one by one [Ramirez et al., 2011] such that

p(n(t)|s, θ) = ΠT_t=1ΠN_i=1pni(t)|s, θ, ni(1), ni(2), . . . , ni(t − 1)

(2.6)

where ni(t)is the number of spikes red by neuron i at time t and N is the total number

of neurons. The likelihood for a given neuron i is modeled as a Poisson distribution:

pni(t)|s, θ, ni(1), ni(2), . . . , ni(t − 1)

= exp(−λi(t)dt)

(λi(t)dt)ni(t)

ni(t)! (2.7)

where λi(t) is the instantaneous ring rate at time t of the ith neuron and dt is the time

period (time bin) in which the number of rings for neuron i, nt(t)is calculated. In (2.7),

the variable λi(t) is modeled using a generalized linear model (See subsection 2.2.3).

As mentioned before, for neural decoding using the Bayesian approach, we need to max-imize the product of likelihood and prior. The likelihood will be equal to the poisson distribution as given by (2.7). To nd the prior, a typical distribution function (i.e. a Gaussian) is considered for each spectro-temporal bin of the spectro-temporal (S-T) rep-resentation. Afterwards, the parameters associated to these distributions (i.e. variance and mean) are obtained by tting these distributions to histograms obtained from the S-T representation of a dataset of stimuli. The whole prior is the joint probability of priors for all spectro-temporal bins in the S-T representation. After calculating prior for the S-T representation of the stimuli, the maximum a posteriori can be used to reconstruct an unseen stimulus from its associated neural activity.

(42)

2.2.3 Generalized linear model (GLM)

A Generalized Linear Model (GLM) is one of the frequently used models for esti-mating the instantaneous ring rate and it has been shown that it is suciently accurate to represent the response of a neuron. The GLM is a cascade of a linear regression model and a link function which can be a nonlinear [Pillow et al., 2011].

The GLM relates the instantaneous spiking rate to variables that are important for spiking including the stimulus, spectro-temporal receptive eld of a neuron, post spike lter and neuron's non linearity function [Calabrese et al., 2011]. The general equation of a GLM for neural encoding is:

λi(t) = f (bi+ K ∗ s + J X j=1 hi(j)ni(t − j)) (2.8) where K*s = PM f =0 PN

τ =0ki(f, τ )s(f, t − τ )is the convolution between a spectro-temporal

representation of the input stimulus at time t and the spectro-temporal receptive eld of the neuron (STRF). The variable ni(t − j)represents the spike train history of the neuron.

Also, hi(j) represents the coecients of the post spike lter of the neuron. Here, f(.)

(which is the link function of the GLM) is an exponential function and the oset b models the baseline ring of the cell.

Fig. 2.10 is an example of a GLM for a single neuron, it comprises four main blocks: 1- a linear block, 2- a non linear block, 3- a point process block and 4- a post spiking lter block. The GLM establishes a relation between the ring rate of a neuron, λ(t), the past spiking history, the spontaneous spiking rate and the stimulus spectrogram s with F frequency bins and T time bins (a matrix F by T ). The reason for the latter is that most neurons in the auditory system have a spectro-temporal receptive eld which responds to specic time and frequency regions of the stimulus.

The spectro-temporal receptive eld (STRF) (K) of neuron i with M time bins (M < T ) and F frequency bins is the linear block of the GLM which lters the spectrogram s. Thus the output of the STRF block is K*s (the convolution between the STRF and the

(43)

2.2. NEURAL DECODING 19

Figure 2.10 A GLM for a single neuron in the auditory system. It comprises four main blocks, 1. a linear, 2. a non linear, 3. a point process block and 4. a post spiking lter block from left to right respectively [Calabrese et al., 2011]. In K*s, the "*" is the convolution symbol between STRF and spectro-temporal representation in respect with time.

spectrogram). The next block is a nonlinearity function f(.) which models the nonlinearity in the thresholding function of a neuron. The output of this function is the instantaneous spike rate λ(t). Finally, h (a 1 by L vector) is the post-spike lter. Using this lter, the ring of the neuron i at time t becomes dependent on its ring at the previous times. In the point process block, rstly a poisson distribution is generated with its parameter (mean of the distribution) equal to the ring rate obtained from the nonlinear function. Then, to generate a sample history of the ring of the neuron, a sample spike train is generated by sampling the resulted poisson distribution using the time rescaling method [Brown et al., 2002] (see APPENDIX A.2). In this research, as the neural activity (and hence the history of ring) is available there is no need to apply the point process block.

The GLM is advantageous because it has a low computational complexity and its results can be easily interpreted [Calabrese et al., 2011]. It takes into account the spiking history. Furthermore, the model can be used to generate a series of spike trains. This allows to compare the actual and reconstructed spike trains. Similar GLM models have also been used for dierent sensory systems such as the visual system, where it has been successfully used to accurately predict spiking responses of single retinal ganglion cells [Pillow et al., 2005].

(44)

2.2.4 Generalized bilinear model (GBM)

The GLM presented in subsection 2.2.3 models the instantaneous ring rate of an auditory neuron. The neuron parameters in this GLM are tted using a dataset of the stimuli and spike responses.

Ahrens et al. [Ahrens et al., 2008] proposed a modication to the GLM where the spectro-temporal representation of the input stimulus is transformed through an additional non linear function to provide a more ecient matching between the input and the model. This modied GLM was termed the generalized bilinear model (GBM). The GBM is thought to model the nonlinear synaptic or dendritic responses in a neuron. In [Ahrens et al., 2008], the GBM was applied to neurons from the somatosensory cortex of rodents and it was shown that the tted parameters were generally more accurate when using a GBM than a GLM. Considering the dierences in the nature of the inputs in the auditory and somatosensory systems, the validation of such an approach in the auditory system remains to be conrmed. More importantly, to the knowledge of the author the use of GBMs have not been evaluated in the context of neural decoding. As is seen in Fig.2.11, in the GLM,

Figure 2.11 A GBM for a single neuron. It comprises: a spectro-temporal transform of the input signal, an input nonlinearity, a linear STRF, an output nonlinearity, a point process and a post spiking lter block from left to right respectively [Ahrens et al., 2008].

there is only an output nonlinearity (an exponential in (2.8)), while in the generalized bilinear model, an input nonlinearity g(.) is also considered before the STRF module. In the bilinear model, the estimated ring rate λi(t)is as below:

(45)

2.2. NEURAL DECODING 21 λi(t) = f bi+ F −1 X f0₌₀ M −1 X τ0₌₀ ki(f0, τ0)g(s(f0, t − τ0)) + J X j=1 hi(j)ni(t − j) ! (2.9) where g(.) is the input nonlinearity, f(.) is the xed output nonlinearity, bimodels baseline

ring and is independent from the stimulus. The variable kiindicates the spectro temporal

receptive eld (STRF) of neuron with M time bins and F frequency bins (it is a linear mapping of stimulus to response). The variable hi(j) is referred to as history of spiking

of a neuron with length J, it includes information about ring rate saturation, refractory periods and bursting behavior.

Since the input non linearity is learned from the data set of the stimuli and neural activity, it is more tractable to have a linear approximation of the nonlinear function on some basis, such as the tailor expansion (the form of this nonlinear function is not xed and is plotted in Fig. 2.13). Hence the input nonlineary function g(.) is computed as a linear combination of a xed set of basis functions gk, k ∈ {1, ..., q}. These basis functions might be chosen

based on the background knowledge of the physiology and response of the studied brain area [Ahrens et al., 2008]. Each basis function gk receives a single input. In this work,

for simplicity, the basis functions are chosen to be piecewise linear functions is mentioned in [Ahrens et al., 2008]. These functions are a group of functions which are separated by equally spaced points (nodes) {m1, ..., mq} which cover the stimulus amplitude range. In

Fig.2.12, a set of piecewise functions for q = 10 are shown. The equation of these basis functions are [Vidaurre et al., 2011]:

g(xj) = q

X

k=1

αkgk(xj) (2.10)

in which αk's are the coecients of the input lter which should be learned from the data

and the gk(xj)'s are:

gk(xj) =          (xj − mk−1)/(mk− mk−1) if k > 1, mk−1≤ xj < mk (mk+1− xj)/(mk+1− mk) if k < q, mk ≤ xj < mk+1 o Otherwise (2.11)

(46)

It has been shown that piecewise linear basis can eciently span the input stimulus space [Vidaurre et al., 2011]. Each piecewise linear basis is considered as triangles between 2 nodes. m1 m2 m3 m4 m5 m6 m7 m8 m9 m10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 g₁ g₂ g₃ g₄ g₅ g₆ g₇ g₈ g₉ g₁₀ Function value univariate input

Figure 2.12 A group of piecewise linear basis, gk, k = 1, .., 10. Each piecewise

linear base gk is a triangle function centered at the input value mi [Vidaurre

et al., 2011].

The input nonlinearity is not xed and should be learned along with the STRF and post spike lter. After learning the coecients αk's associated to the linear expansion of the

input nonlinearity, the input nonlinearity function g(.) is calculated by using (2.10). The input nonlinearity models the nonlinear dendritic property of the neuron. Hence for each neuron the input nonlinearity is specic and is independent of the input stimulus. Note that piecewise linear bases gk are constant while input nonliterary g(.) is learned for each

neuron as the coecients ak are learned. A sample result for the input nonlinearity is

shown in Fig.2.13 for dierent neurons.

The coecients associated to the input nonlinearity, post spike lter and linear STRF are estimated using ML optimization. It was shown in [Ahrens et al., 2008] that the tted

(47)

2.2. NEURAL DECODING 23 parameters in a GBM are more accurate than GLM. As mentioned previously, the in-put nonlinearity in the GBM helps simultaneously modeling the nonlinear synaptic and dendritic responses in a neuron. Another property that also favors the GBM is that the response of a neuron is invariant to the sign of the input [Ahrens et al., 2008].

0 5 10 15 −0.15 −0.1 −0.05 0 0.05 0.1 0.15 0.2 Stimulus value (S) Input nonlinearity g(S)

Learned input nonlinearity for neuron 1 Learned input nonlinearity for neuron 2 Learned input nonlinearity for neuron 3

Figure 2.13 Example of input nonlinearities of 3 neurons (recording sites) using the maximum likelihood tting method for the dataset of this project. By in-creasing the number of piecewise linear bases, these input nonlinearity functions become smoother [Ahrens et al., 2008].

2.2.5 A review of dierent neural decoding studies

In the following, we briey present various neural decoding studies using dierent ap-proaches and performed on dierent animal models. As an example of neural decoding using the Bayesian method, Brown et al. [Brown et al., 1998] studied the position pre-diction problem from all neural activities of rat hippocampal place cells. These cells are known to encode the position (visual landmarks) of an animal with respect to some refer-ence. Brown et al. proposed an encoding-decoding approach. In the encoding part, each spike train is modeled as an inhomogeneous poisson process. A Gaussian random walk is used to model the animal's direction. In the decoding stage, a Bayesian approach is

(48)

used to obtain a nonlinear recursive causal lter algorithm for predicting the position of the animal from the place cell ensemble ring patterns [Brown et al., 1998]. The decoding algorithm denes a mapping between the discrete spike trains and the predicted position where the animal's location is specied by neural activities information. In other terms, the probability of the location of the animal given all ring pattern should be maximized [Brown et al., 1998].

In 2005, Pillow et al. [Pillow et al., 2005] showed that the stimulus selectivity, relia-bility, and timing precision of primate retinal ganglion cell responses can be reproduced accurately with a simple model. This model consists of a leaky integrate-and-re spike generator (a cascade of a capacitor (integrate part) and a resistor (leaky part)) driven by: a linearly ltered stimulus, a post spike current and a Gaussian noise current [Pillow et al., 2005] . They also compared their results with the simple linear nonlinear poisson model (LNP). LNP is similar to GLM except that the history lter block is not considered. They performed neural decoding using a maximum likelihood estimation approach.

Also, in another work, Pillow et al. [Pillow et al., 2011] applied decoding models us-ing GLM, based on point-process neural encodus-ing models. They proposed applications of the encoding model to the problem of decoding of stimulus from population spike re-sponses. An approach was proposed for calculating maximum a posteriori estimation of the stimulus. Also, a method was proposed for calculating a Gaussian approximation to the posterior. They also found that the Maximum a posteriori decoder are more ecient in neural decoding than optimial linear estimator (OLE).

Also, by performing neural decoding in dierent parts of the auditory system, Rieke et al. showed that the information in the amplitude spectrum of naturalistic stimulus (Bullfrog advertisement call)can be encoded much better than broadband white noise stimuli [Rieke et al., 1995]. This shows that the animal's auditory system is adapted and responds better to natural stimuli than broad band white noise where spike trains carry information 2-6 times higher for the natural stimuli than broadband white noise.

Mesgarani et al. studied the impact of behavioral state on the response characteristics of ferret's auditory cortex. The goal of the study was to show how complex sounds are en-coded in the primary auditory cortex (AI). They applied a linear spectro-temporal model to map neural population responses to the spectrogram of the stimulus [Mesgarani et al., 2009]. As a result, some spectro-temporal features were calculated over which AI neurons

(49)

2.3. DIFFERENT SPECTRO-TEMPORAL REPRESENTATIONS 25 can faithfully encode. Another results was that using the prior knowledge of the statis-tics of complex natural sounds (i.e. speech stimuli) improves the reconstruction accuracy. Finally, they modeled the attentional and motivational state of the animal dierently as changes in STRF of neurons, depending on the BF (best frequency) of neurons and other baseline spectro-temporal properties. Also they showed that these changes may have sys-tematic eects on the reconstructed stimulus.

Also, Pasley et al. in 2012 [Pasley et al., 2012] proposed to reconstruct speech from the neural activity of the human auditory cortex. They gathered recordings from non primary auditory cortex in the human superior temporal gyrus in order to show what acoustic information in speech sounds can be reconstructed from the population of the neural ac-tivities. They found that by using a linear model based on a spectrogram, slow temporal uctuations, like syllable rate, were accurately reconstructed. However, for fast uctu-ations like syllable onsets and osets, a nonlinear sound model was required based on temporal modulation energy. They claim that their method can identify separate words from the brain activity.

In [Ramirez et al., 2011], Ramirez et al. reported experiments on zebra nches. They used Bayes rule to reconstruct the spectrogram of zebra nch stimuli using the neural activity recorded from the neurons located at the auditory midbrain part of the animal's cortex. In their work, the instantaneous ring rate of a neuron was obtained through a generalized linear model (GLM) that considers the neuron's past rings as well as a spectrogram of the input stimulus. To reconstruct the spectrogram of the stimulus, the prior was modeled with multivariate Gaussian probability density function. Also, the likelihood was modeled with a poisson distribution where the mean of the distribution equals the instantaneous ring rate. They found that spike trains from a population of mesencephalicus lateral dor-salis neurons located in the auditory midbrain combined with an uncorrelated Gaussian prior could estimate the envelope of song spectrograms.

2.3 Dierent spectro-temporal representations

As mentioned previously, because of the spectro-temporal receptive eld of auditory neu-rons, these neurons carry information about dierent parts of the spectro-temporal repre-sentation of auditory stimuli instead of time domain stimuli. Hence, here spectro-temporal

(50)

representation of the input stimulus is typically considered in the GLM and GBM models for the auditory system (see equations (2.8), (2.9)). Since the spectro-temporal represen-tation of the stimulus is reconstructed using neural decoding, it is very important to use a suitable spectro-temporal representation.

There are dierent types of specto-temporal representations. Here, we review some rele-vant ones that will be used in the present research work, including: spectrogram, gamma-tone lter bank and nally spikegram.

2.3.1 Spectrogram

To compute the spectrogram, a short time Fourier transform (STFT) is used. In STFT, the time axis of the signal is divided into a number of regularly spaced shifted windows. The discrete Fourier transform is evaluated for each windowed signal. The STFT has either a good frequency resolution or a good time resolution. In fact, one of the problem of the STFT is that the width of all windowed signals is xed, thus a wide window gives better frequency resolution but poor time resolution and a narrower window gives good time resolution but poor frequency resolution.

Figure 2.14 A sample spectrogram (lower plot) for a sample sound (upper plot).

(51)

2.3. DIFFERENT SPECTRO-TEMPORAL REPRESENTATIONS 27 The squared magnitude of the sliding short-time Fourier transform (STFT), usually in decibel (dB), is called a spectrogram. A sample spectrogram is shown in Fig. 2.14. The spectrogam has two dimensions, time on the horizontal axis and frequency on vertical axis, the amplitude given by either a gray or a color scale. As the neurons of the auditory system are not too much sensitive to the phase information of input stimuli, in GLM and GBM models, only the amplitude of the spectro-temporal representations are applied to model the ring rate of neurons.

2.3.2 Gammatone lter bank

Another time-frequency representation of interest is the gammatone lter bank (GFB). Gammatones are basis which mimic the auditory lter bank of the mammal ear [Loizou, 1998]. A sample gammatone representation is shown in Fig.2.15 while a gammatone lter bank is shown in Fig.2.16.

Figure 2.15 A sample gammatone represented in the time domain. It is the result of the multiplication of a gamma function and a sinus function [Slaney, 1993]. A set of these gammatones with dierent center frequencies can be used as the gammatone lterbank for spikegram representation.

The GFB mimics the ltering of the basiliar membrane of the human auditory system [Slaney, 1993]. The GFB representation is obtained by ltering the input signal with a

(52)

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 x 104 −60 −50 −40 −30 −20 −10 0 Frequency (Hz) Filter Response (dB)

Figure 2.16 A sample Gammatone lter bank represented in the frequency domain. The gammatone center frequencies cover the range between 20 Hz and 20 kHz. [Slaney, 1993]. This set of gammatone kernels can be used for spikegram representation of audio stimuli.

bank of gammatone lters [Slaney, 1993]. A gammatone lter bank includes a set of gam-matone bases with dierent center frequencies and dierent bandwiths. The closed form equation of a gammatone basis [Slaney, 1993] has a gamma part and a tone part as

g[n] = anm−1e−2πlncos[2π(fc/fs)n + φ] (2.12)

in which n is the time index, m and l are used for tuning the gamma and tone parts, fs is

the sampling frequency, φ is the phase and fc is the center frequency of the gammatone.

2.3.3 Spikegram

A more recent time-frequency representation which is bio-inspired is the spikegram. The spikegram is a sparse time frequency representation of a signal. It uses an optimiza-tion method such as matching pursuit over the gammatone dicoptimiza-tionary to nd a spectro-temporal representation of signal in which most of the elements of the representation are zero (sparse representation) [Smith and Lewicki, 2006].

(53)

2.3. DIFFERENT SPECTRO-TEMPORAL REPRESENTATIONS 29 Spikegrams have been recently proposed to represent audio signals [Pichevar et al., 2011]. It has been shown that the brain has a sparse activity due to energy eciency constraints [Smith and Lewicki, 2005]. A spikegram is a loose spectro-temporal representation of the neural activity at higher auditory centers that takes into account this sparsness and that considers also the masking of sounds occurring in the peripheral auditory system. To achieve such representation, a signal x[n] is mapped on gammatone basis [Pichevar et al., 2011] where the parameters that multiply each of these basis are sparse (most of these pa-rameters are zero). The 2D dictionary (time channel plane) Φ is generated by modulating a basis (here a gammatone [Slaney, 1993]) to be located at Nc channels ci and Ts time

shifts τi (the number of all basis functions is N = NcTs). Hence,

x[n] =

N

X

i=1

aigci(n − τi) (2.13)

where ai's are coecients of the representation while most of them are zero (they are

sparse coecients), gci(n − τi) is one gammatone basis function from the 2D dictionary which is located at channel ci and time sample τi [Pichevar et al., 2011]. The coecients

used in this representation are obtained through the perceptual matching pursuit (PMP) algorithm that aims at representing the signal with the smallest reconstruction error and the greatest sparsity (number of zero coecients divided by the number of all coecients) [Najaf-Zadeh et al., 2008]. The 2D plot of all obtained sparse coecients in the time channel plane is called a spikegram.

We now describe more in detail the PMP algorithm. At each iteration, PMP nds a sparse coecient ai for a gammatone basis function at time position τi, channel number

ci in critical bank k of the 2-D time-channel plane.

Moreover, at each iteration i, a masking threshold Mk(n) (in dB) at time n and critical

band k is generated and updated (the detail for setting the masking threshold is mentioned in [Najaf-Zadeh et al., 2008]). PMP keeps only audible gammatones, that can be heard by the ear of average people. To nd the audible kernels, rst, at each iteration a sensation level in (dB) for the gammatone at time n and critical band k is calculated as below:

(54)

SL(n, k) = 10log10(

A2(n)G2_k QTk

) (2.14)

where A(n) is the magnitude of the ith _{kernel extracted in critical band k at time n. G(k)}

is the peak value of the Fourier transform of the normalized kernel in critical band k and QTk is the elevated threshold in quiet for the same critical band. At each iteration, if the

sensation level SL(k, n) for the extracted gammatone kernel is higher than the masking threshold Mk(n), then the gammatone is considered audible, otherwise it is masked and

removed from the representation (See Fig. 2.17). In this case, only audible gammatones are extracted [Pichevar et al., 2011].

Figure 2.17 The masking eect performed in the perceptual matching pursuit. If the sensation level SL(k, n) of a gammatone kernel with amplitude A(n) in the critical band k is higher than its associated masking threshold Mk(n), then

the gammatone is called audible [Pichevar et al., 2011]. Note that based on (2.14), for higher A(n) the sensation level SL(k, n) is higher.

In addition to applying auditory masking, the PMP has another bio-inspired property which is the sparse representation. The sparse representation models both auditory mask-ing and the lateral inhibition between the neurons in the auditory system [Pichevar et al.,