• Aucun résultat trouvé

The benefit of native language and semantic context while listening to speech in noise and the neural correlates of speech intelligibility

N/A
N/A
Protected

Academic year: 2022

Partager "The benefit of native language and semantic context while listening to speech in noise and the neural correlates of speech intelligibility"

Copied!
67
0
0

Texte intégral

(1)

Master

Reference

The benefit of native language and semantic context while listening to speech in noise and the neural correlates of speech intelligibility

PEFKOU, Maria

Abstract

Native listeners can use semantic context more efficiently than non-native listeners in order to recognize words masked with noise. Functional Magnetic Resonance Imaging (fMRI) data were acquired during a retroactive semantic priming task in order to investigate the neural correlates of speech intelligibility and the interaction between semantic relatedness and native language. Participants were better at recognizing words masked with noise at different signal to noise ratios (SNR) when followed by semantically related compared to unrelated words in their native language. In their non-native language, participants performed better for semantically unrelated word pairs, providing evidence for differences in semantic processing between native and non-native languages. fMRI results revealed that activation in premotor regions negatively correlated with SNR while activation in regions composing the resting state network positively correlated with SNR. I conclude that motor regions are recruited during effortful speech comprehension.

PEFKOU, Maria. The benefit of native language and semantic context while listening to speech in noise and the neural correlates of speech intelligibility. Master : Univ.

Genève, 2012

Available at:

http://archive-ouverte.unige.ch/unige:18744

Disclaimer: layout of this document may differ from the published version.

1 / 1

(2)

The benefit of native language and semantic context while listening to speech in noise and the neural correlates of speech intelligibility

MEMOIRE REALISE EN VUE DE L’OBTENTION DU

MASTER UNIVERSITAIRE INTERDISCIPLINAIRE EN NEUROSCIENCES

Neurosciences choix choix

PAR Maria Pefkou

DIRECTEUR DU MEMOIRE Dr Alexis Hervais-Adelman

JURY

Pr Ulrich Frauenfelder Pr Narly Golestani Pr Christoph Michel Dr Alexis Hervais-Adelman Dr Serge Vuillemoz

GENEVE février 2012

UNIVERSITE DE GENEVE

FACULTE DE PSYCHOLOGIE ET DES SCIENCES DE L'EDUCATION SECTION PSYCHOLOGIE

(3)

RESUME

Native listeners can use semantic context more efficiently than non-native listeners in order to recognize words masked with noise. Functional Magnetic Resonance Imaging (fMRI) data were acquired during a retroactive semantic priming task in order to investigate the neural correlates of speech intelligibility and the interaction between semantic relatedness and native language. Participants were better at recognizing words masked with noise at different signal to noise ratios (SNR) when followed by semantically related compared to unrelated words in their native language. In their non-native language, participants performed better for semantically unrelated word pairs, providing evidence for differences in semantic processing between native and non-native languages. fMRI results revealed that activation in premotor regions negatively correlated with SNR while activation in regions composing the resting state network positively correlated with SNR. I conclude that motor regions are recruited during effortful speech comprehension.

(4)

1

Acknowledgments

I would like to thank my supervisor, Alexis Hervais-Adelman, for his teaching and guidance throughout the past one and a half years, for his very helpful comments on previous versions of this thesis and, above all, for his endless enthusiasm and inspiration, without which it would have been impossible to complete this work.

I gratefully thank Christoph Michel, for his useful advice and support, and Narly Golestani, whose previous work has inspired the experiment described in this thesis.

Special thanks to my labmates for offering their help and expertise in several occasions and for patiently listening to my complaints.

I infinitely thank my parents for their support, even though I am incapable of explaining to them in what exactly my research consists.

Special thanks to Michelly, for transmitting me her endless passion for research, and Marcel, who unintentionally determined my decision to do this master’s.

Finally, my sincere thanks to all my friends, as well as to some complete

strangers, who willingly or not listened to me talking about my research.

(5)

2

Table of contents

Introduction ... 3

1. Speech comprehension ... 3

2. Theories of speech perception ... 5

2. A. Abstractionist models ... 5

2. B. Episodic models ... 6

2. C. Hybrid models ... 7

2. D. Motor theory of speech perception ... 8

3. The benefit of native language and semantic context while listening to speech in noise and the neural correlates of speech intelligibility ... 10

3. A. Neural correlates of effortful speech comprehension ... 10

3. B. Semantic priming ... 12

3. C. Native language benefit in speech comprehension ... 13

4. The present study ... 17

Method ... 18

Participants ... 18

Materials and design ... 18

Procedure ... 20

Data pre-processing and analysis ... 22

Results ... 23

Behavioural results ... 23

Imaging results ... 26

Discussion ... 30

1. The effect of semantic context in speech perception ... 30

2. Neural correlates of speech intelligibility ... 32

3. The default mode network ... 34

4. Absence of the effect of language and semantic context in the brain ... 35

Conclusions ... 37

References ... 38

Appendix ... 45

(6)

3

Introduction

1. Speech comprehension

The first published attempts to understand how the human brain processes language date back to the 19th century, when neurologists of the period, such as Carl Wernicke and Pierre Paul Broca, noticed that damage of specific brain regions impairs understanding or producing speech (Broca, 1861; Wernicke, 1874). At that time the only way to make

inferences about brain functions was to observe brain-damaged patients and associate a lesion with one or more symptoms. In more recent time research has made remarkable progress and technology has provided the tools to “look into the brain” of healthy individuals. Therefore, many facts about how the human brain produces and understands language have been established during the last few years. For example, language seems to be a left-lateralised function for the vast majority of right-handed people as well as for a large proportion of the left-handed population, as it has been shown by Wada tests and functional Magnetic

Resonance Imaging (fMRI) studies (Wada & Rasmussen, 1960; Knecht et al., 2000).

Since the very beginning of research on language, speech production and perception have been thought to be well separated in the brain. This distinction was initially based on the finding of Pierre Paul Broca (1861) concerning the existence of a specific brain region whose damage impairs speech production but leaves comprehension intact. Carl Wernicke, a German neurologist, demonstrated that a different brain region may be responsible for speech

comprehension, as its damage impairs the latter leaving speech production intact (Wernicke, 1874). Since then, the majority of researchers focused on either of the two functions,

admitting though that they are not completely independent from each other. Although both functions are of major interest, the present study will focus on speech perception and, more specifically, speech comprehension.

Speech comprehension is a very robust process even under adverse listening

conditions. Experimentally probing speech comprehension under ideal listening conditions (anechoic rooms, very little or no background noise, unfiltered signal) does not precisely simulate the challenging speech input we receive in everyday life. Speech transmitted by mobile phones, by the radio, in a noisy background, foreign accents (e.g. Clarke & Garrett, 2004) and different speech rates (e.g., Altmann & Young, 1993) are some examples of challenging speech input, as the information available to the listener is reduced, modified or masked. However, even when the received acoustic signal is degraded or distorted, speech often remains comprehensible which indicates that our speech processing system can take into

(7)

4 account all these signal variations and adapt to them.

An extended brain network is involved in speech perception. Hickok and Poeppel (2007) propose that speech processing systems follow a hierarchical scheme, starting from the auditory cortex, with Heschl's gyrus, and extending to temporal regions bilaterally, such as the medial-posterior superior temporal gyrus (STG) for phonological processing, the posterior medial temporal gyrus (pMTG) and the posterior inferior temporal sulcus (pITS) where access to the lexicon may take place. Other brain regions involved in this network are the anterior middle temporal gyrus (aMTG) and the anterior inferior temporal sulcus (aITS) which are suggested to combine information coming from lexical processing as well as from the articulatory network.

Figure 1: The dual stream model of the functional anatomy of language (taken from Hickok and Poeppel, 2007).

As it is illustrated in the figure above, the acoustic signal is processed at different levels (i.e. phonological, lexical, semantic and syntactic) by different brain structures which

(8)

5 interact with each other. The model takes into account the top-down influences which have a major impact on the way the acoustic input is perceived. For example, speech is perceived as a coherent stream of words although it is composed by periodic sounds, aperiodic noise and silence (Davis & Johnsrude, 2007) and speech perception is highly modulated by non- auditory sensory input, such as visual (e.g. McGurk & McDonald, 1976) and tactile (Fowler, 1995), as well as by prior lexical knowledge (Ganong, 1986). Such evidence strongly

suggests that speech perception is an interactive process, integrating previous knowledge and already established representations of language in the processing of the acoustic speech signal.

2. Theories of speech perception

Different models have been proposed in the attempt to explain speech comprehension.

These models can be broadly classified as abstractionist, episodic, hybrid and motor theories.

2. A. Abstractionist models

Abstractionist models of speech perception, such as TRACE (McClelland and Elman, 1986), Merge (Norris, McQueen and Cutler, 2000) and the Distributed Cohort model (Gaskell and Marslen-Wilson, 1997) support the idea that word recognition is independent of voice processing and takes place in a distributed fashion. More specifically, these models suggest that words are stored as abstract phonological representations, while indexical information is discarded. Indexical information is defined as the properties of speech that indicate group membership (e.g. dialect), characterise the individual (e.g. gender, age) and reveal the speaker’s emotional state (Levi & Pisoni, 2007).

Although important differences between them do exist, generally speaking

abstractionist models suggest that words are processed at multiple levels (e.g. phonetic and lexical), interacting or even competing with each other, in order to retrieve the representation that best matches the acoustic input. Abstractionist models assume that there is an invariant representation of every word of our lexicon and its activation, stimulated by an acoustic input, leads to the recognition of this specific word.

Evidence for this point of view is provided by the fact that priming in lexical decision tasks is equal for the same versus a different speaker (Orfanidou, Marslen-Wilson & Davis, 2006). Priming consists of presenting the same object or word on more than one occasions and investigate the effect of the first presentation on the subsequent ones. If repetition priming

(9)

6 is speaker-independent, it can be argued that it is the activation of the representation of a word that facilitates its subsequent processing and not the stimulus-specific characteristics. Further, in defense of abstractionist models, neuropsychological evidence suggests that speaker recognition can be impaired without impairing comprehension (Van Lancker, Kreiman, &

Cummings, 1989). Thus, surface acoustic properties are considered of minor importance and the essential step in word comprehension is the extraction and identification of speech units from the acoustic signal.

Although abstractionist models take into account the fact that we are in general capable of recognising a word independently of the voice that has pronounced it, they

nevertheless fail to adequately explain the advantage of using speaker-specific information in order to understand a challenging speech input. For example, according to abstractionist models, understanding speech in noise should only depend on the correct identification of speech units and should not be affected by speaker-specific characteristics. However, previous research has shown that speech in noise is better understood when the speaker’s voice is familiar (Nygaard, Sommers & Pisoni, 1994; Nygaard & Pisoni, 1998), suggesting that speaker-specific information stored in memory facilitates comprehension, contrary to the predictions of abstractionist models.

2. B. Episodic models

Episodic models of speech perception (e.g. MINERVA 2, Hintzman, 1986) suggest that every instance of a spoken word leaves a trace in episodic memory, storing lexical but also indexical information. Thus, when one hears a word, all memory traces sharing properties with it are activated and determine whether the stimulus will be classified as an already known word or not.

MIVERVA 2 model accounts very well for accurate memory of voices. For example, speakers can be identified on the basis of idiosyncratic phoneme production even in sine-wave speech, which is a type of degraded speech produced by resynthesizing the formants of a speech signal with sinusoids (Remez, Fellowes & Rubin, 1997). Episodic models also explain long-term repetition priming effects (Tenpenny, 1995). However, episodic models do not account for the distinction between word recognition and indexical information, for which there is neuropsychological evidence. Further, episodic models fail to explain the verbal transformation effect, which accounts for the switching between percepts when a speech form

(10)

7 is cycled in rapid and continuous repetition (Sterzer, Kleinschmidt & Rees, 2009). For

example, rapid repetitions of the word “life” result in alternations in perception between the words “life” and “fly”. According to episodic models, the very first percept is stored in memory and should reinforce a similar percept for the subsequent repetitions, which is apparently not the case.

Episodic models can also be criticised for requiring huge memory capacities. Although it may be impossible to demonstrate the limits of human memory, processing speech in a purely episodic manner without any input normalisation is neither parsimonious nor particularly efficient.

In terms of effortful speech comprehension, episodic models predict that when one deals with a challenging input, such as speech in noise, the information they extract activates memory traces with shared properties which could lead to the identification of the stimulus.

Exposure to speech in noise should facilitate future comprehension, as traces related to this experience will be stored in memory and activated when one is exposed to a similar input.

Because of the episodic traces proposed by this class of models, contextual information and speaker-specific properties should also have an important impact on effortful comprehension.

2. C. Hybrid models

Hybrid models (Klatt, 1979; Tulving and Schachter, 1990; Kirsner, Dunn and Standen, 1987) combine elements from abstractionist and episodic models in order to better account for all known psycholinguistic effects. These models assert that both lexical prototypes and phonetic variations are stored in memory and used in order to support comprehension. Klatt (1979) suggests that episodic traces of speech are stored alongside abstract prototypes.

According to Kirsner, Dunn and Stangen (1987), there is a lexicon of abstract and episodic procedural records. These records correspond to the processes required to match the acoustic input with an abstract lexical representation and are shaped by surface features, such as the speaker’s voice. If the same or similar surface features emerge from a novel speech input, they may trigger the application of past procedural records stored in memory.

The combined use of abstract representations and indexical information, proposed by hybrid models, provides more elements in order to explain speech comprehension under adverse listening conditions. The facilitatory role of episodic traces is obvious in the use of speaker-specific information, such as voice familiarity while listening to speech in noise

(11)

8 (Nygaard et al., 1994; Nygaard & Pisoni, 1998). The existence of abstract representations or abstract procedural records could explain the generalisation of perceptual learning between different types of degraded speech, for example speech vocoded with different carrier signals (Hervais-Adelman, Davis, Johnsrude, Taylor & Carlyon, 2011).

2. D. Motor theory of speech perception

A controversial theory of speech perception is the motor theory (Liberman, Cooper, Shankweiler, & Studdert-Kennedy, 1967; updated by Liberman & Mattingly, 1985; then by Liberman & Whalen, 2000), which claims that for both perception and production the primary representations of speech are the phonetic gestures of the articulatory apparatus. In other words, there is a shared code between speech production and perception.

Recent research in speech perception using modern neuroimaging techniques has provided some evidence for the involvement of motor regions in speech processing, which, I suggest, supports the motor theory of speech perception. Wilson, Saygin, Sereno and Iacoboni (2004) demonstrated that overlapping motor and premotor areas are activated when producing and listening to speech sounds. In this study participants were asked to listen to blocks of the same monosyllable meaningless word and, subsequently, to produce this same stimulus in an fMRI experiment. Common activations were observed between conditions in the superior part of the ventral premotor cortex bilaterally, extending towards the primary motor cortex.

Although this study did not use words, it provides strong evidence on the important role of the premotor cortex in sublexical processing of speech.

Tremblay and Small (2011) used fMRI in order to investigate whether there are brain regions conjointly active during speech perception and production and whether the activity of these is modulated by articulatory complexity. They found that the left ventral premotor cortex was significantly activated in both tasks (i.e. speech production and perception) but activation in this region was modulated by the articulatory complexity of syllables only during speech production

A series of studies using Transcranial Magnetic Stimulation (TMS) have also shed light on the role of motor cortex in speech perception. Watkins and Paus (2004) demonstrated that motor evoked potentials (MEPs) measured on lips muscles and elicited by TMS pulses on the face primary motor cortex change their excitability when participants heard speech sounds or viewed speech-related lips movements but not when listening to non-verbal sounds or

(12)

9 viewing brow movements. Given that the excitability of face primary motor cortex changes while attending speech-related stimuli, it can be argued that this part of the primary motor cortex not only underlies speech production but it is also implicated in speech perception.

Another study (Meister, Wilson, Deblieck, Wu and Iacoboni, 2007) used fMRI to localise a part of the premotor cortex equally activated during speech perception and production. After its localisation, the activity of this region was inhibited using repetitive TMS (rTMS) and participants were asked to perform a simple speech perception task. Results revealed that their performance significantly deteriorated compared to the control condition, in which rTMS was not applied.

Further, Mötönnen and Watkins (2009) demonstrated that inhibiting articulatory motor cortex (more specifically the part of the primary motor cortex controlling lip movements) impairs categorical perception of phonemes whereas inhibition of hand motor cortex does not.

The studies mentioned above provide strong evidence that motor regions are, indeed, activated while listening to speech sounds and play an active role in understanding speech, at least at a sub-lexical level. The involvement of motor regions could be particularly useful during effortful speech comprehension, for example while listening to speech in noise, in which case the acoustic signal may not be sufficient to support comprehension unless it is coupled with additional information.

The above overview of theories of speech perception described phenomena which suggest that speech comprehension is not a simple bottom-up process. Several factors have an impact on speech comprehension, such as previous exposure to a certain speech stimulus (i.e.

repetition priming), familiarity with speaker’s voice and the presence of visual (McGurk &

McDonald, 1976) or tactile (Fowler, 1995) cues. These top-down influences can have a facilitatory impact on speech perception in general but in particular under challenging listening conditions (e.g. background noise) or when the received acoustic input has been degraded (e.g. sine-wave speech) or distorted (e.g. time-compressed speech), as in these cases the information the receiver is able to extract from the acoustic signal may not be sufficient for comprehension. In this report I will focus on the role of two top-down influences, namely native language and semantic context, on effortful speech comprehension.

(13)

10 3. The benefit of native language and semantic context while listening to speech in noise and

the neural correlates of speech intelligibility

The first aim of the present study is to investigate the role of native language and semantic context on comprehension of speech in noise. Additionally, using words mixed with noise in different signal to noise ratios, the present study aims to shed light on the neural correlates of listening to masked speech.

3. A. Neural correlates of effortful speech comprehension

There are different types of challenges in the speech input we receive in our everyday life. For example, the acoustic signal can be compressed and spectral information can be eliminated (e.g. filtered speech, vocoded speech) or masked (e.g. speech in noise). In order to study how the human brain successfully deals with challenging speech input one can present artificially distorted or degraded speech.

The manipulation used in the present study is masking speech by noise. Speech in noise is created by embedding a speech sound (e.g. a phoneme, word or sentence) in noise. By varying the signal to noise ratio (i.e. the ratio of the levels of signal and noise), the

intelligibility of the stimulus changes. Various types of noise, such as speech-shaped noise (Golestani, Rosen & Scott, 2009) or multi-speaker babble tracks (Holle, Obleser,

Rueschemeyer & Gunter, 2010) have been used in order to simulate the noisy background we have to face in everyday life conditions. Being potentially intelligible but challenging, speech in noise has been used in the literature in order to determine the role of various factors, such as listener’s age (Pichora-Fuller, Schneider & Daneman, 1995), native language (Golestani et al., 2009) and hearing impairments (Bronkhorst & Plomp, 1989), in speech comprehension.

The question of which brain regions are involved in understanding speech in noise was directly addressed by Wong, Uppunda, Parrish and Dhar (2008). In this study, participants were asked to listen to words in a quiet background or embedded in noise with 2 different signal to noise ratios (SNR), -5 and +20 decibels (dB), and match them with pictures. Results revealed additional activation in the superior temporal gyrus (STG) as well as in parietal and frontal regions for speech embedded in noise compared to quiet background. The comparison between the two noisy conditions (-5dB versus +20dB) revealed increased activation in posterior STG and left anterior insula. Thus, the intelligibility of the acoustic signal modulates activity in auditory and motor regions as well as in an attentional network, suggesting,

(14)

11 according to the authors, that understanding speech in noise requires not only a finer spectro- temporal analysis of the acoustic signal but also higher level cognitive and attentional processes.

The involvement of motor regions in effortful speech comprehension has also been demonstrated by studies using other types of signal distortion or degradation, such as time- compression and noise-vocoding. Time-compression of speech increases the number of speech units presented over a given period of time, similarly to artificially increasing speech rate. This distortion is based on the pitch synchronous overlap technique (PSOLA), which averages across time adjacent pitch periods for voiced speech segments and chooses an arbitrary time window to average unvoiced segments (Mehler et al., 1993). In a recent fMRI experiment, (Adank and Devlin, 2010) individuals were asked to listen carefully to time- compressed sentences and say whether their content was true or false. Increased activation in left premotor cortex and posterior superior temporal sulcus was observed while listening to time-compressed compared to normal speech during the first block of 16 sentences. It was followed by a decrease in activation of these regions, without returning to the same level as in the normal speech condition, probably reflecting adaptation to the distorted input.

A well-studied type of degraded speech is noise-vocoded speech. Noise-vocoding (Shannon, Zeng, Kamath, Wygonski and Ekelid, 1995) involves decomposing an acoustic signal into a defined number of bandpass-filtered channels, from which the time-varying amplitude envelope is extracted. The extracted amplitude envelopes are smoothed by a lowpass filter and then each one of them is used to modulate a separate wideband noise. Each modulated noise is band-pass filtered into the frequency range of the source channel, and the amplitude-modulated noise bands are then recombined. The number of frequency bands used to process the signal is crucial to the intelligibility of the noise-vocoded output- the greater the number of bands, the more intelligible the output. Noise-vocoding removes much of the spectral information from the signal but preserves the slowly varying temporal cues. Noise- vocoding has been used to simulate sound transduced by cochlear implant processors (Shannon et al., 1995; Loizou, Dorman, & Tu, 1999; Faulkner, Rosen, & Smith, 2000).

In a recent fMRI experiment (Hervais-Adelman, Carlyon, Johnsrude & Davis, submitted) participants were asked to listen to completely incomprehensible and potentially comprehensible noise-vocoded words (vocoded by using one and six frequency bands, respectively), as well as clear words. The contrast between degraded but potentially

comprehensible versus clear words revealed significant activation in the left premotor cortex

(15)

12 as well as in the left anterior insula. These regions may be recruited in order to provide the system with supplementary or alternative representations of speech on the sub-lexical or lexical level which could potentially contribute to the understanding of a highly degraded input. It is remarkable that no difference was observed in these regions while contrasting non- intelligible versus clear speech, suggesting that the previously mentioned activations can be attributed to the extra cost of the challenging input and not to the increased spectral

information.

Very few studies have used high temporal resolution methods, such as

Electroencephalography (EEG) and Magnetoencephalography (MEG), to look at the cortical electrical activity during effortful speech comprehension. A recent study by Millman, Woods and Quinlan (2010) used MEG to investigate how noise-vocoded words are represented in terms of power changes and functional asymmetries between the two hemispheres. The results showed localised low-frequency (delta and theta) changes in the left hemisphere and high frequency changes in the right hemisphere for noise-vocoded compared to clear words, suggesting differential roles for each hemisphere in speech processing. In another study (Obleser & Kotz, 2010) participants were asked to attend to noise-vocoded sentences with either a high (e.g. “she sifts the flour”) or low (e.g. “she weighs the flour”) cloze probability verb while EEG was recorded. The amplitude of the N400 component of the signal was shown to depend on the intelligibility of the acoustic signal, as it was higher for 4 versus 16 frequency bands. The N400 is a negative component of event related potentials (ERPs) peaking at around 400ms post stimulus-onset and it is thought to reflect cognitive-linguistic integration processes.

To sum up, there is already some evidence for the implication of mainly auditory and motor regions in effortful speech comprehension from fMRI studies as well as amplitude changes and asymmetric processing from EEG and MEG studies. However, more research is necessary in order to better investigate the neural correlates of speech comprehension when the received acoustic input is not clear.

3. B. Semantic priming

Semantic priming was first reported by Meyer and Schvaneveldt (1971) in a lexical decision task, in which individuals were presented with pairs of written words or

pseudowords and were asked to indicate whether both stimuli were words or not. The authors

(16)

13 found that when the words were semantically related participants responded faster and more accurately compared to unrelated word pairs.

Semantic priming is consistent with Collins and Loftus’ (1975) spreading-activation theory of semantic processing. Their theory proposes that recognition of a word not only involves activation of the concept corresponding to this word but also activation of related concepts. Thus, if one of these concepts is subsequently presented, it will be processed faster as it will have already been activated. Semantic priming has been extensively studied with visually presented words and different tasks, such as lexical decision (e.g. Shelton & Martin, 1992) and reading tasks (e.g. Masson, 1995). Non-linguistic tasks, in which participants were for example asked to make pleasantness versus gender judgements on words (e.g. Draine &

Greenwald, 1998) have also revealed a semantic priming effect as these judgements were more accurate when stimuli were preceded by semantically related words.

In the auditory domain, semantic priming has been recently investigated using a

lexical decision task (Daltrozzo, Signoret, Tillman & Perrin, 2011), in which individuals heard two semantically related or unrelated words separated by a pause of 50ms. The first word, which participants were asked to ignore, was the prime while the second word, on which they had to make a lexical decision, was the target. Results revealed better and faster performance for related compared to unrelated pairs even when the prime was presented at a very low intensity level while, in a post-test, participants failed to correctly categorize the prime at this intensity, providing evidence that they were unaware of the word and unable to consciously process it. It can be therefore concluded that preceding semantic information constitutes a strong top-down influence which facilitates subsequent processes in speech perception.

3. C. Native language benefit in speech comprehension

Previous research (Nabelek and Donahue, 1984; Takata and Nabelek, 1990; van Wijngaarden, Steeneken and Houtgast, 2002; Golestani, Rosen and Scott, 2009) has

demonstrated that effortful speech comprehension is more difficult for non-native compared to native listeners of a given language. More specifically, bilingual individuals are able to make better use of the linguistic context in their native compared to the non-native language under adverse listening conditions. Note that in this context the term “bilingual” refers to people who started speaking a second language after the age of three.

The use of contextual information in bilinguals has been extensively studied using the

(17)

14 Speech Perception in Noise (SPiN) sentences (Kalikow, Stevens & Elliott, 1977). These are affirmative sentences, in which the last word can be of either high or low predictability (high or low cloze probability sentences). SPiN sentences are embedded in different levels of noise and people are asked to listen to them and identify the last word. The rationale of this

paradigm is that, given the fact that people can usually predict the last word in the high cloze probability condition, it is usually identified faster and more accurately compared to low cloze probability sentences even when the signal to noise ratio is low.

Mayo, Florentine and Buus (1997) demonstrated that the age of second language acquisition plays a key role in understanding speech in noise in this language. The authors compared performance of four groups of listeners, monolingual English speakers, bilinguals since infancy, bilinguals who had learnt English before the age of 6 and “late” bilinguals (post-puberty) in identifying the last word of SPiN sentences of either high or low cloze probability. Results revealed a main effect of context as listeners performed better for high compared to low cloze probability sentences but this difference in performance negatively correlated with the age of second language acquisition. Hence, contextual information plays a facilitatory role in understanding speech in noise but only for the native language or, to a lesser extent, for non-native languages acquired very early in life.

Another related study (Bradlow & Alexander, 2007) tested native and non-native English speakers with high and low cloze probability sentences under two different

conditions, clear and plain speech. Clear speech was characterized by very clear intonation, as if the speaker was addressing to a person with hearing loss, while in the plain speech

condition the speaker used a more conversational style. Results revealed that native listeners benefit from acoustic (i.e. clear versus plain speech) and semantic (i.e. high versus low last word predictability) information while non-native listeners benefit from semantic information only in the clear speech condition. The authors propose that non-native listeners can also make use of context in order to facilitate speech comprehension but only under optimal listening conditions.

Although it is clear that listening to speech in one’s native language facilitates the use of contextual information, supporting speech comprehension, the question of which part of context is used remains open. Given that sentences contain semantic, syntactic and prosodic information, it is difficult to investigate the role of each one of these sources separately in a study using sentences.

The role of semantics in language comprehension has been investigated in a semantic

(18)

15 priming study by Bernstein, Bissonnette, Vyas and Barclay (1989). Using a visual retroactive semantic priming paradigm, the authors demonstrated that when a word (target), presented for a very limited amount of time, is followed by a visual mask and a semantically related word (prime), it is identified or recognized more accurately than when it is followed only by a mask. Interestingly, when the target is followed by a semantically unrelated prime, its identification deteriorated compared to the condition in which it is only followed by a mask.

In other words, priming does not work only forwards, as in the experiments presented above, but also backwards, as a word can have an impact on the perception of a previously presented one.

In order to investigate the native language benefit in speech, Golestani, Rosen and Scott (2009) used a modified, “auditory” version of the backwards semantic priming

paradigm. In this version, participants heard a word masked with noise (“target”), followed by a semantically related or unrelated word (“prime”) which was always clear (not embedded in noise). Their task was to select the target from two semantically related words, the target itself and a foil, in a two-alternative forced choice (2AFC) task taking place at the end of each trial (see figure 2 below). The experiment included native French speakers, with school knowledge of English, and took place in both English and French (i.e. participants' native and non-native language). The primes were mixed with speech-shaped noise at a range of signal to noise ratios (SNR) of -7dB, -6dB and -5dB. Decreasing SNR values were expected to produce stimuli harder to understand. An infinite SNR condition, in which primes were not mixed with noise, was also included.

(19)

16 Figure 2: Schematic representation of a trial in Golestani, Rosen and Scott (2009). The target word was embedded in noise, whose intensity varied between mini-blocks of 6 trials. It was followed by a semantically related or unrelated unmasked word (prime). In the end of each trial, a 2AFC screen appeared and participants had to recognise the prime previously presented.

Results revealed that participants were more accurate in recognizing the target in their mother tongue that in their non-native language. What is more, recognition was better for high SNR levels in both languages. Critically for the authors' hypothesis, according to which participants would make better use of semantic context in their native language, semantic relatedness played a facilitatory role in the native language condition, with higher

performance for targets followed by a semantically related word. The opposite effect was observed in the non-native language trials, in which participants demonstrated a better performance for targets followed by semantically unrelated compared to related words (see figure 3 below). This language by semantic-relatedness interaction suggests a benefit of the native language in the use of semantic context so as to understand speech in noise. The converse, observed in the English trials, although difficult to interpret, could be related to the organisation of the mental lexicon in bilinguals. For example, one could imagine that all the words composing a person’s vocabulary are grouped into clusters according to certain criteria, which could be different in the native and non-native languages.

Prime

Target

(20)

17 Figure 3: Performance in native and non-native language as a function of semantic context in a retroactive semantic priming study (taken from Golestani, Rosen and Scott, 2009).

Performance was significantly better for semantically related pairs in French (native language) and for semantically unrelated pairs in English (non-native language).

4. The present study

In order to investigate the neural correlates of speech intelligibility and to identify brain structures whose activation is modulated by contextual semantic information, the above paradigm was used in an fMRI experiment. At the behavioural level, I predicted better

performance for more intelligible stimuli (i.e. higher signal to noise ratio), a main effect of language with better performance in the native compared to non-native language as well as an effect of contextual (i.e. semantic) information in the native language. This effect would be reflected by better performance for targets followed by a semantically related prime compared to those followed by an unrelated prime. Regarding the imaging results, in line with previous studies (Narain et al., 2003; Obleser and Kotz, 2010; Obleser, Wise, Dresner, & Scott, 2007;

Scott et al., 2000; Scott et al., 2006), I hypothesised diferential activation as a function of intelligibility in temporal regions, such as the superior temporal gyrus and sulcus bilaterally, which are traditionally associated with speech perception. Furthermore, I predicted

differential activation in the left inferior frontal gyrus as a function of word intelligibility (Davis and Johnsrude, 2003) as well as in motor regions, in line with recent evidence on the

(21)

18 role on premotor cortex in speech perception (e.g. Tremblay & Small, 2011). Concerning the effect of semantic relatedness in the native language I expected activation in a high-level speech processing network, involving areas such as the angular gyrus (Obleser et al., 2007;

Obleser and Kotz, 2010), if the responses were driven by the context, for example in semantically related trials. As for the semantically unrelated trials, more activation was expected in regions typically associated with sound processing, such as the auditory cortex and the STS.

Method

Participants

Fifteen right-handed native French speakers (2 male) participated in the study.

Participants self-reported as having no history of hearing or visual impairment. They had all grown up in European French-speaking countries and had not started learning English until the age of 10. They had followed English classes during their education for 6 years on average (range: 3 to 10 years) and none of them used English in their everyday life. Eleven of the participants had also learnt a third language (German, Spanish or Italian) but were not fluent in it nor used it in their everyday life. Participants were recruited by the means of posters, placed in various building of the University of Geneva. They all live in the broad area of Geneva or nearby France and the vast majority are students. All participants gave written informed consent to participate in the study, which was approved by the NEUCLID ethics committee of the University Hospital of Geneva (protocol number: NAC 11-014).

Materials and design

The words used in this experiment were exactly the same as those used by Golestani et al. (2009). French semantically related and unrelated word pairs were selected from the

“French word association norms for 366 names of objects”database (Ferrand and Alario, 1998) and English words were selected from the University of South Florida Free Association Norms (Nelson, McEvoy and Schreiber, 1998).

For each language two lists of 520 words were generated in such a way that in the first list a specific primed word was followed by a semantically related prime while in the second list it was followed by an unrelated one and vice versa (see Appendix for a full list of stimuli).

(22)

19 Stimuli across lists (within language) were matched for spoken wordform frequency and number of syllables as higher word frequency may result in easier recognition (Bradlow and Pisoni, 1999) and longer words are easier to identify than shorter ones (Golestani et al., 2009).

Spoken wordform frequency information was taken from “Lexique 3” database for French words (New, Brysbaert, Veronis and Pallier, 2007; www.lexique.org), measured as the number of instances per million of words in films. For the English words, these properties were taken from the English Lexicon Project (Balota et al., 2009). The number of syllables of the primed words was also matched across language.

Half of the participants received the first list while the other half the second one, so as to ensure that any observed effects were not due to stimulus-specific properties but due to the manipulation of the variables of interest. What is more, for each list two sublists were created, differing at the level of noise in which each primed stimulus was embedded. Again,

participants tested with the same list were divided into two groups, tested with separate sublists. The aim of this manipulation was to control for differential phoneme recognition effects as some phonemes (for example, fricative consonants such as /s/ and /z/) are more easily detected when embedded with noise than other ones (Miller and Nicely, 1955;

Boothroyd, Mulhearn, Gong and Ostroff, 1996).

At the end of each trial, participants performed a two alternative forced choice task in which they were asked to select the target word between a pair of visually-presented words.

The foil words were all semantically related to the target words and they were matched with primes in terms of number of syllables so as to ensure that participants’ responses were not driven by the length of the word.

The English and French words were digitally recorded by a multilingual female speaker in an anechoic chamber using a sampling rate of 44.1 kHz with 16 bit quantization.

The microphone was positioned 30 cm from the speaker’s mouth, at 15 degrees to the mid- sagittal line. The final set of stimuli was created off-line by editing the words at zero-

crossings before and after each word. Recordings were normalized with respect to root-mean- squared amplitude and embedded in sound files with a duration of 1.1 seconds (Golestani et al., 2009).

The signal to noise ratio (SNR) levels used were ∞dB (i.e. no noise), -5dB, -7dB and - 9dB. The reason for this choice comes from piloting data revealing a very high recognition rate, reaching 80% even for the -7dB SNR condition and almost 100% for the unmasked condition. Piloting data were collected from six participants, who were scanned following the

(23)

20 procedure described below, with the only difference that the SNR levels used were ∞dB, - 5dB, -6dB and -7dB. Thus, we decided to increase the difficulty of the task by increasing the sampling range, as the effect of SNR observed in the piloting phase was mainly driven by the unmasked condition. Auditory stimuli were embedded in speech-shaped noise, which

approximated the average speech spectrum of a male voice.

The factors included were SNR (4 levels: -9, -7, -5, no-noise), semantic relatedness (2 levels: related and non-related), language (2 levels: French and English) and silent trials, resulting in a total of 17 conditions. There were 30 trials per non-silent condition (16 non- silent conditions) resulting in a total of 240 semantically related and 240 unrelated word pairs (120 per language condition). The SNR level was grouped into mini-blocks of 6 trials and the semantic context was mixed within mini-blocks. Language was grouped into blocks of 120 trials.

Procedure

Participants firstly underwent a very brief training session in order to determine that they had understood the task. The training session consisted of 8 trials with 2 trials per SNR condition. Thus, all SNR levels were included in training so as to familiarise participants with the difficulty and timing parameters of the experiment. Once the training was completed, participants were installed in the MRI scanner and data acquisition began.

Participants were scanned using a Siemens Trio 3-Tesla MRI scanner with a 12- channel head coil. The auditory stimuli were presented binaurally through MRI compatible headphones providing an attenuation if the scanner noise of about 30dB. Participants were also provided with a button box containing four buttons and they were instructed to use two of them for their responses in the 2AFC task.

At the beginning of the scanning session, a high-resolution T1-weighted anatomical scan (3DMPRAGE, resolution 1 mm x 1mm x 1mm voxels) was acquired. Functional images were acquired using a sparse sequence (Hall et al., 1999). In this case the acquisition of functional volumes is not continuous but there is a silent period between scans, long enough for the presentation of the auditory stimuli (see figure 4(b) below). Four series of 160 gradient-echo images of blood-oxygenation-level-dependent (BOLD) signal were acquired (TR=5.1s, TA=2.1, 2.5 x 2.5 x 3.25mm). Auditory words were presented during the 3s silent period between scans. For each participant we ensured temporal lobe coverage in the

beginning of the scan session.

(24)

21 The experiment was divided in four blocks, which alternated in French and English.

The total duration of each block was 13 minutes and 53 seconds, corresponding to 160 scans.

Each block started with three dummy volume acquisitions to limit initial longitudinal magnetisation effects. Each block consisted of 20 mini-blocks, each of which included 6 trials, with the same SNR for the target, and 2 silent scans at the end. The order of mini- blocks was completely randomized between participants.

The presentation software E-prime (Psychology Software Tools, Inc.) was used for visual and auditory stimuli presentation and for response collection. Each trial began with a light grey “X” on a black background for 300ms. The fixation cross remained on the screen while the two auditory stimuli, the target and prime words, were played. For half of the trials of each block the word pairs were semantically related while for the other half they were not.

The target was presented embedded in noise at one of four SNRs (-9, -7, -5 and no-noise) while the prime was always clearly audible. The mean duration of sound files containing the auditory stimuli was 1100ms. Immediately after the end of the second word, two words were presented on screen, one being the target and the other a semantically related foil. The words were presented in white characters in a black background, equally spaced from the centre of the screen, for a total duration of 1500ms. The position of the target (i.e. left or right with respect to the screen centre) was pseudo-randomised such that there was an equal number of left and right-positioned targets per condition for every block. Participants were asked to press the button of the response box corresponding to “1”, using their right index finger, when the target appeared on the left and “2”, with their right middle finger, when it appeared on the right part of the screen. The time available to make a response was equal to the time the two words remained on the screen (1500ms).

(25)

22 Figure 4: Representation of scanning protocol and single trial. (a) Schematic diagram of scanning protocol. Each mini-block consisted of 6 trials. Within mini-block all primes were presented at the same SNR level while semantic-relatedness varied pseudo-randomly. Two null events took place at the end of each mini-block, illustrated with grey bars. (b) Schematic diagram of a single trial. On the top row, the temporal sequence of stimuli presented to participants. On the bottom, acquisition of a volume is illustrated as a function of time. Note that acquisitions took place after the end of the second auditorily presented word.

Data pre-processing and analysis

Data were pre-processed and analysed using SPM8 (Wellcome Trust Center for Neuroimaging, University College, London, UK). At the single subject level, all echo planar imaging (EPI) volumes were realigned to one another in a two-steps procedure: firstly, all images were realigned to the first one and a mean image calculated. Secondly, all images were realigned to the mean image. The structural image was coregistered to the mean realigned EPI volume, and then normalised to a standard structural template (SPM’s grey-matter, white- matter and cerebrospinal fluid templates). The EPI volumes were normalised by application of the parameters estimated in the structural scan normalisation step and spatially smoothed

s 0

semantically related target and prime semantically unrelated

null events

X parrot eagle

ms 1400

300

0 2500 4000 5100

X

(a)

(b)

(26)

23 using an isotropic Gaussian Kernel of 8mm3 full width at half maximum height (FWHM).

A first level statistical analysis was carried out for each participant using a general linear model (GLM) in which each scan was coded for condition. Regressors included the 17 conditions as well as the six movement parameters derived from the realignment step of the pre-processing (x, y and z translations and x, y and z rotations). A high-pass filter (cutoff 128 seconds) and correction for first-order serial autocorrelation were also applied.

Next, an analysis of variance (ANOVA) on the imaging data was performed in order to test for main effects of each of the three factors as well as for any interactions between them.

The analysis of group data was achieved by entering the parameter estimates of the 17 conditions into a single random-effects model. The ANOVA on the imaging data was carried out using the contrast vectors described by Henson and Penny (2005), with three factors:

language (2 levels), SNR (4 levels) and semantic relatedness (2 levels).

Results

Behavioural results

A 3-way repeated measures analysis of variance was performed on accuracy scores with language, SNR and semantic context as within-subjects variables. Results revealed a main effect of language (F(1,15) = 9.786, p<0.01), showing that performance was better for French compared to English and a main effect of SNR (F(3,42) = 97.019, p<0.01),

demonstrating that participants were more accurate in recognising targets at higher compared to lower SNR levels. As predicted by the results of the previous behavioural study (Golestani et al., 2009) there was a significant language by semantic relatedness interaction (F(1,14) = 7.24, p=0.018). There was no significant effect of relatedness and no other significant interactions. These results are illustrated in figure 5. Planned contrasts revealed better performance for semantically related versus non-related trials in French whereas in English participants performed better for non-related trials. Thus, the semantic relatedness of the prime and the target plays a facilitatory role in correctly recognising the target in the native language while recognition deteriorated in the non-native language.

(27)

24 Figure 5: (a) Performance in recognition task for each condition for all 15 subjects.

-10 -8 -6 -4 -2 0 2

0.75 0.8 0.85 0.9 0.95

1 english rel

english unrel french rel french unrel

French English

0.78 0.8 0.82 0.84 0.86 0.88

Percent Correct

Language

rel unrel

Percent correct

(a)

(b)

(28)

25 (b) Language by semantic relatedness interaction. In both graphs error bars indicate the standard error of the mean.

A 3-way repeated measures analysis of variance was also performed on response times for correct trials with language, SNR and semantic context as within-subjects factors. Here also results revealed a main effect of language (F(1,14) = 30.367, p<0.01), demonstrating that participants were faster for trials taking place in French than in English and a main effect of SNR (F(3,42) = 73.646, p<0.01), showing that responses were quicker for higher SNR levels.

There was also a significant effect of relatedness (F(1,14) = 9.078, p<0.01) and a significant language by relatedness interaction (F(1,14) = 13.558, p<0.01). There were no other significant two or three-way interactions. Response times results are illustrated in figure 6. Post-hoc Bonferroni-corrected comparisons revealed that participants were significantly faster at recognising targets followed by a semantically related compared to unrelated prime in French while there was no significant difference between semantically related and unrelated word pairs in English.

Figure 6: Latencies for correct responses as a function of SNR, language and semantic context for all 15 subjects. Error bars represent the standard error of the mean.

-10 -8 -6 -4 -2 0 2

650 700 750 800 850 900 950

english rel english unrel french rel french unrel

RT

SNR

(29)

26 To sum up, behavioural results demonstrated that there is a significant effect of semantic context which is facilitatory in the native language but detrimental in the non-native language.

The next step is to look at the imaging results in order to see how the effects detected in the behavioural level are reflected in the brain.

Imaging results

Unless otherwise stated, all imaging results reported in this section are significant at a whole-brain family-wise error (FWE) corrected level of p<0.01. Coordinates of peak

activations are in the MNI (Montreal Neurological Institute) space.

The random effects analysis of variance revealed SNR-related changes to activations in a widespread temporal, parietal and frontal network (see Table 1). This network included the angular gyrus, the middle temporal gyrus (MTG), the precuneus, the frontal operculum, the insulae and the supplementary motor area (SMA) bilaterally, as well as activation of the left premotor cortex (see figures 7 and 9).

Table 1: Main effects of SNR, presented at a family-wise error (FWE) of p<0.01. The grey-

Structure cluster size x y z Z score

left precuneus 1151 -38 -74 36 inf.

left anterior insula 430 -30 22 0 inf.

right angular gyrus 966 44 -70 36 inf.

left supplementary motor area 487 0 20 46 7.406

left precuneus 1759 -4 -54 26 7.324

left inferior frontal gyrus 816 -42 16 22 7.078

left middle frontal gyrus 202 -26 28 40 6.817

left middle frontal gyrus 870 -4 58 8 6.699

right anterior insula 150 34 24 -2 6.477

right middle frontal gyrus 36 28 32 42 5.558

left middle temporal gyrus 12 -64 -14 -18 5.422

right middle temporal gyrus 10 66 -14 -18 5.327

left inferior frontal operculum 1 -54 12 10 5.172

right calcarine sulcus 7 22 -98 -2 5.158

right supramarginal gyrus 3 62 -24 32 5.145

left dorsal anterior cingulate area 7 -8 34 -8 5.14

left superior middle frontal gyrus 2 -2 56 34 5.028

left superior middle frontal gyrus 1 0 58 32 5.008

(30)

27 shaded rows correspond to brain regions whose activation negatively correlated with SNR while the non-shaded rows indicate brain regions which showed greater activity for clear than masked words. This table was extracted from SPM and shows peak voxels separated by more than 8mm.

Careful examination of the data allowed us to decompose this network into two distinct ones, according to their activation as a function of whether the target word was

masked with noise or not. Deactivation for low SNR values (i.e. masked words) was observed bilaterally in angular gyrus, MTG and precuneus and left frontal operculum. Thus, these regions were more active when the target was clear and, hence, easier to understand, but they were deactivated when there was noise in the signal. Precuneus, angular gyrus and frontal operculum are known to be part of a default mode network (Raichle et al., 2000) while MTG is a speech-processing region (e.g. Vouloumanos, Kiehl, Werkrt & Liddle, 2001; Davis &

Johnsrude, 2003). Left angular gyrus is also involved in reading (e.g. Joubert et al., 2004) and it was shown to be active in speech tasks in which linguistic context assists comprehension under adverse listening conditions (e.g. Obleser et al., 2007; Obleser & Kotz, 2010). The activation of this resting state and speech related network during the easy conditions suggests that clear speech can be successfully understood without the recruitment of additional brain regions that might be necessary for effortful speech comprehension.

On the other hand, for lower SNR levels (i.e. masked targets) activation was observed in left premotor cortex, left posterior inferior frontal gyrus (Broca’s area), the insulae

bilaterally and supplementary motor area (SMA) bilaterally. This motor network is more active for words embedded in noise, suggesting that it is involved in effortful speech comprehension.

The activation patterns of some of these regions as a function of SNR are illustrated in figure 8.

(31)

28

x=-4 x=-56

x=-46

x=-36

x=-26

x=-10

x=+56

x=+46

x=+36

x=+26

x=+10

correlation with noise correlation with signal

Figure 7: Brain regions showing an effect of SNR displayed on the MNI standard brain at a family-wise error (FWE) corrected level of p<0.01. Regions in which activation positively correlated with noise are represented in red.

Regions that showed a positive correlation with signal are represented in blue.

(32)

29 Figure 8: Mean effect size relative to rest for responses to different SNR levels. Error bars show standard error of the mean after between-subjects variance has been removed, suitable for repeated-measures comparisons (Loftus & Masson, 1994).

Figure 9: The noise-correlated (in red) and the signal-correlated (in blue) networks, presented at a family-wise error (FWE) corrected level of p<0.01. “L” and “R”

(A) Left precuneus

-9 -7 -5 0

-2 -1.8 -1.6 -1.4 -1.2 -1 -0.8 -0.6 -0.4 -0.2 0

mean parameter estimate for condition vs baseline

SNR (dB)

-9 -7 -5 0

-2 -1.8 -1.6 -1.4 -1.2 -1 -0.8 -0.6 -0.4 -0.2 0

mean parameter estimate for condition vs baseline

SNR (dB)

(B) Right angular gyrus

-9 -7 -5 0

-0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2

mean parameter estimate for condition vs baseline

SNR (dB)

-9 -7 -5 0

-0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6

mean parameter estimate for condition vs baseline

SNR (dB)

(D) Left supplementary motor area (C) Left anterior insula

(33)

30 indicate the left and right hemisphere respectively.

Despite the significant effect of language and the significant language by relatedness interaction in the behavioural data, there was no significant effect of these factors on brain activity.

Discussion

In this study, an auditory retroactive semantic priming paradigm was used in order to investigate the neural correlates of intelligibility and contextual information in speech processing. A previous study (Golestani et al., 2009) using this paradigm revealed that individuals can efficiently use contextual semantic information in their native language whereas in their non-native language the presence of semantic associations has a negative impact on recognition of words masked with noise. We replicated these results, demonstrating that there is, indeed, a significant benefit of native language in the use of contextual

information while listening to speech in noise. More specifically, native French speakers more quickly and more accurately recognized masked targets in French when followed by

semantically related primes compared to when the primes were not semantically related.

Interestingly, a deleterious effect of semantic relatedness on accuracy was observed in English, as recognition scores were lower for semantically related pairs. However, there was no difference in terms of response latencies in the non-native language.

1. The effect of semantic context in speech perception

The native language benefit with regard to the use of contextual information is a well- studied fact in the literature of bilingualism. Previous studies that used sentences (e.g. Mayo et al., 1997) have also provided evidence for more efficient use of contextual information in the native language. The crucial point of the present study is that it demonstrates, in line with a previous behavioural experiment (Golestani et al., 2009), that it is specifically the semantic context that facilitates recognition of speech in noise. This finding should not be interpreted as if semantic context was the only source of contextual information that native listeners can use more efficiently. On the contrary, it could also be the case that native listeners make better use of other contextual information, such as syntactic cues and prosody in their native language.

(34)

31 The deleterious impact of semantic relatedness in the non-native language (English) is an intriguing finding. As suggested in the introduction (page 16), the disadvantage of semantic context in the non-native language may be due to the different organization of the mental lexicon in each language in bilingual individuals. It could be the case that semantically related words are grouped into clusters in the native but less in the non-native language. However, this interpretation does not account for the negative impact of semantic relatedness in the non- native language as, if semantic information is not used for grouping words in the non-native language, one would expect no effect of semantic relatedness at all.

Alternatively, semantic relatedness could be a source of interference in the non-native language as individuals may not be able to retrieve the exact meaning of words. Thus,

semantically related word pairs may present an additional challenge in the non-native language as listeners have to invest more cognitive resources in trying to disentangle their meaning.

Interestingly, a previous study (Moreira & Hamilton, 2006) demonstrated that semantic cues are sometimes misused in the non-native language. In this study, the authors tested native English and Spanish school-aged speakers with a rhyming task. Subjects were presented with four cards, all images of objects, among which one was the target (e.g. “coat”).

The other images illustrated a rhyming object (e.g. “goat”), a semantically related object (e.g.

“shirt”) and an object which was neither phonologically nor semantically related to the target (e.g. “guitar”). Participants had to indicate which item rhymed with the target (e.g. “coat - goat”). Although performance for native English speakers was perfect, native Spanish speakers performed rather poorly. Critically, though, Spanish speakers tended to select the semantically related item much more often than the non-related item, suggesting that their mistakes were not random but characterized by a high-order cognitive processing in a

relatively simplistic phonological task. In their interpretation of similar results, Golestani and colleagues (2009) suggest that in the non-native language semantic processing is more prominent and requires more attentional resources, which, as a consequence, cannot be used for processing the acoustic stimuli. Although our results cannot give an answer to this issue, they provide strong evidence for differences in semantic processing of words between native and non-native language.

The effect of relatedness and its differential role between native and non-native language, even though only at the behavioural level, is a very interesting finding in the general framework of speech perception. It proves that speech comprehension is not a purely bottom-up process, completely dependent on the acoustic signal, but it is highly interactive

Références

Documents relatifs

We retimed naturally produced sentences to an isochronous form (or a matched anisochronous form, see hereafter) by locally compressing or elongating speech portions cor- responding

The effects of noise, reverberation and channel distortion on speech perception and production by native listeners are examined in detail in Assmann and Summerfield

Thus, although the direction of difference was contrary to the long-stay German listeners’ pattern, it was the same as that of the native English listeners,

The intelligibility gain of Lombard speech over speech produced in quiet was thus attributed to durational increases (i.e. slow speaking rate) and more spectral

These results indicate that in more difficult condition such as in the diotic presentation with a JND voice, participants were listening to the masker voice instead of the target

Taking into account that acoustic speech envelope follows artic- ulatory facial movements with a delay of up to 200 ms (e.g., [ 48 ]) and that neural effects of electric

(1) a) In line with previous studies on multimodal speech perception =&gt; integration mechanisms of auditory and visual speech signals.. b) A visual processing advantage when

Table 3 Shows the correlation results observed between the priming effect (according to attention: priming effect observed in the attended condition and the priming effect observed