• Aucun résultat trouvé

In order to investigate the neural correlates of speech intelligibility and to identify brain structures whose activation is modulated by contextual semantic information, the above paradigm was used in an fMRI experiment. At the behavioural level, I predicted better

performance for more intelligible stimuli (i.e. higher signal to noise ratio), a main effect of language with better performance in the native compared to non-native language as well as an effect of contextual (i.e. semantic) information in the native language. This effect would be reflected by better performance for targets followed by a semantically related prime compared to those followed by an unrelated prime. Regarding the imaging results, in line with previous studies (Narain et al., 2003; Obleser and Kotz, 2010; Obleser, Wise, Dresner, & Scott, 2007;

Scott et al., 2000; Scott et al., 2006), I hypothesised diferential activation as a function of intelligibility in temporal regions, such as the superior temporal gyrus and sulcus bilaterally, which are traditionally associated with speech perception. Furthermore, I predicted

differential activation in the left inferior frontal gyrus as a function of word intelligibility (Davis and Johnsrude, 2003) as well as in motor regions, in line with recent evidence on the

18 role on premotor cortex in speech perception (e.g. Tremblay & Small, 2011). Concerning the effect of semantic relatedness in the native language I expected activation in a high-level speech processing network, involving areas such as the angular gyrus (Obleser et al., 2007;

Obleser and Kotz, 2010), if the responses were driven by the context, for example in semantically related trials. As for the semantically unrelated trials, more activation was expected in regions typically associated with sound processing, such as the auditory cortex and the STS.

Method

Participants

Fifteen right-handed native French speakers (2 male) participated in the study.

Participants self-reported as having no history of hearing or visual impairment. They had all grown up in European French-speaking countries and had not started learning English until the age of 10. They had followed English classes during their education for 6 years on average (range: 3 to 10 years) and none of them used English in their everyday life. Eleven of the participants had also learnt a third language (German, Spanish or Italian) but were not fluent in it nor used it in their everyday life. Participants were recruited by the means of posters, placed in various building of the University of Geneva. They all live in the broad area of Geneva or nearby France and the vast majority are students. All participants gave written informed consent to participate in the study, which was approved by the NEUCLID ethics committee of the University Hospital of Geneva (protocol number: NAC 11-014).

Materials and design

The words used in this experiment were exactly the same as those used by Golestani et al. (2009). French semantically related and unrelated word pairs were selected from the

“French word association norms for 366 names of objects”database (Ferrand and Alario, 1998) and English words were selected from the University of South Florida Free Association Norms (Nelson, McEvoy and Schreiber, 1998).

For each language two lists of 520 words were generated in such a way that in the first list a specific primed word was followed by a semantically related prime while in the second list it was followed by an unrelated one and vice versa (see Appendix for a full list of stimuli).

19 Stimuli across lists (within language) were matched for spoken wordform frequency and number of syllables as higher word frequency may result in easier recognition (Bradlow and Pisoni, 1999) and longer words are easier to identify than shorter ones (Golestani et al., 2009).

Spoken wordform frequency information was taken from “Lexique 3” database for French words (New, Brysbaert, Veronis and Pallier, 2007; www.lexique.org), measured as the number of instances per million of words in films. For the English words, these properties were taken from the English Lexicon Project (Balota et al., 2009). The number of syllables of the primed words was also matched across language.

Half of the participants received the first list while the other half the second one, so as to ensure that any observed effects were not due to stimulus-specific properties but due to the manipulation of the variables of interest. What is more, for each list two sublists were created, differing at the level of noise in which each primed stimulus was embedded. Again,

participants tested with the same list were divided into two groups, tested with separate sublists. The aim of this manipulation was to control for differential phoneme recognition effects as some phonemes (for example, fricative consonants such as /s/ and /z/) are more easily detected when embedded with noise than other ones (Miller and Nicely, 1955;

Boothroyd, Mulhearn, Gong and Ostroff, 1996).

At the end of each trial, participants performed a two alternative forced choice task in which they were asked to select the target word between a pair of visually-presented words.

The foil words were all semantically related to the target words and they were matched with primes in terms of number of syllables so as to ensure that participants’ responses were not driven by the length of the word.

The English and French words were digitally recorded by a multilingual female speaker in an anechoic chamber using a sampling rate of 44.1 kHz with 16 bit quantization.

The microphone was positioned 30 cm from the speaker’s mouth, at 15 degrees to the mid-sagittal line. The final set of stimuli was created off-line by editing the words at zero-

crossings before and after each word. Recordings were normalized with respect to root-mean-squared amplitude and embedded in sound files with a duration of 1.1 seconds (Golestani et al., 2009).

The signal to noise ratio (SNR) levels used were ∞dB (i.e. no noise), 5dB, 7dB and -9dB. The reason for this choice comes from piloting data revealing a very high recognition rate, reaching 80% even for the -7dB SNR condition and almost 100% for the unmasked condition. Piloting data were collected from six participants, who were scanned following the

20 procedure described below, with the only difference that the SNR levels used were ∞dB, -5dB, -6dB and -7dB. Thus, we decided to increase the difficulty of the task by increasing the sampling range, as the effect of SNR observed in the piloting phase was mainly driven by the unmasked condition. Auditory stimuli were embedded in speech-shaped noise, which

approximated the average speech spectrum of a male voice.

The factors included were SNR (4 levels: -9, -7, -5, no-noise), semantic relatedness (2 levels: related and non-related), language (2 levels: French and English) and silent trials, resulting in a total of 17 conditions. There were 30 trials per silent condition (16 non-silent conditions) resulting in a total of 240 semantically related and 240 unrelated word pairs (120 per language condition). The SNR level was grouped into mini-blocks of 6 trials and the semantic context was mixed within mini-blocks. Language was grouped into blocks of 120 trials.

Procedure

Participants firstly underwent a very brief training session in order to determine that they had understood the task. The training session consisted of 8 trials with 2 trials per SNR condition. Thus, all SNR levels were included in training so as to familiarise participants with the difficulty and timing parameters of the experiment. Once the training was completed, participants were installed in the MRI scanner and data acquisition began.

Participants were scanned using a Siemens Trio 3-Tesla MRI scanner with a 12-channel head coil. The auditory stimuli were presented binaurally through MRI compatible headphones providing an attenuation if the scanner noise of about 30dB. Participants were also provided with a button box containing four buttons and they were instructed to use two of them for their responses in the 2AFC task.

At the beginning of the scanning session, a high-resolution T1-weighted anatomical scan (3DMPRAGE, resolution 1 mm x 1mm x 1mm voxels) was acquired. Functional images were acquired using a sparse sequence (Hall et al., 1999). In this case the acquisition of functional volumes is not continuous but there is a silent period between scans, long enough for the presentation of the auditory stimuli (see figure 4(b) below). Four series of 160 gradient-echo images of blood-oxygenation-level-dependent (BOLD) signal were acquired (TR=5.1s, TA=2.1, 2.5 x 2.5 x 3.25mm). Auditory words were presented during the 3s silent period between scans. For each participant we ensured temporal lobe coverage in the

beginning of the scan session.

21 The experiment was divided in four blocks, which alternated in French and English.

The total duration of each block was 13 minutes and 53 seconds, corresponding to 160 scans.

Each block started with three dummy volume acquisitions to limit initial longitudinal magnetisation effects. Each block consisted of 20 mini-blocks, each of which included 6 trials, with the same SNR for the target, and 2 silent scans at the end. The order of mini-blocks was completely randomized between participants.

The presentation software E-prime (Psychology Software Tools, Inc.) was used for visual and auditory stimuli presentation and for response collection. Each trial began with a light grey “X” on a black background for 300ms. The fixation cross remained on the screen while the two auditory stimuli, the target and prime words, were played. For half of the trials of each block the word pairs were semantically related while for the other half they were not.

The target was presented embedded in noise at one of four SNRs (-9, -7, -5 and no-noise) while the prime was always clearly audible. The mean duration of sound files containing the auditory stimuli was 1100ms. Immediately after the end of the second word, two words were presented on screen, one being the target and the other a semantically related foil. The words were presented in white characters in a black background, equally spaced from the centre of the screen, for a total duration of 1500ms. The position of the target (i.e. left or right with respect to the screen centre) was pseudo-randomised such that there was an equal number of left and right-positioned targets per condition for every block. Participants were asked to press the button of the response box corresponding to “1”, using their right index finger, when the target appeared on the left and “2”, with their right middle finger, when it appeared on the right part of the screen. The time available to make a response was equal to the time the two words remained on the screen (1500ms).

22 Figure 4: Representation of scanning protocol and single trial. (a) Schematic diagram of scanning protocol. Each mini-block consisted of 6 trials. Within mini-block all primes were presented at the same SNR level while semantic-relatedness varied pseudo-randomly. Two null events took place at the end of each mini-block, illustrated with grey bars. (b) Schematic diagram of a single trial. On the top row, the temporal sequence of stimuli presented to participants. On the bottom, acquisition of a volume is illustrated as a function of time. Note that acquisitions took place after the end of the second auditorily presented word.

Data pre-processing and analysis

Data were pre-processed and analysed using SPM8 (Wellcome Trust Center for Neuroimaging, University College, London, UK). At the single subject level, all echo planar imaging (EPI) volumes were realigned to one another in a two-steps procedure: firstly, all images were realigned to the first one and a mean image calculated. Secondly, all images were realigned to the mean image. The structural image was coregistered to the mean realigned EPI volume, and then normalised to a standard structural template (SPM’s grey-matter, white-matter and cerebrospinal fluid templates). The EPI volumes were normalised by application of the parameters estimated in the structural scan normalisation step and spatially smoothed

s 0

semantically related target and prime semantically unrelated

null events

X parrot eagle

ms 1400

300

0 2500 4000 5100

X

(a)

(b)

23 using an isotropic Gaussian Kernel of 8mm3 full width at half maximum height (FWHM).

A first level statistical analysis was carried out for each participant using a general linear model (GLM) in which each scan was coded for condition. Regressors included the 17 conditions as well as the six movement parameters derived from the realignment step of the pre-processing (x, y and z translations and x, y and z rotations). A high-pass filter (cutoff 128 seconds) and correction for first-order serial autocorrelation were also applied.

Next, an analysis of variance (ANOVA) on the imaging data was performed in order to test for main effects of each of the three factors as well as for any interactions between them.

The analysis of group data was achieved by entering the parameter estimates of the 17 conditions into a single random-effects model. The ANOVA on the imaging data was carried out using the contrast vectors described by Henson and Penny (2005), with three factors:

language (2 levels), SNR (4 levels) and semantic relatedness (2 levels).

Results

Behavioural results

A 3-way repeated measures analysis of variance was performed on accuracy scores with language, SNR and semantic context as within-subjects variables. Results revealed a main effect of language (F(1,15) = 9.786, p<0.01), showing that performance was better for French compared to English and a main effect of SNR (F(3,42) = 97.019, p<0.01),

demonstrating that participants were more accurate in recognising targets at higher compared to lower SNR levels. As predicted by the results of the previous behavioural study (Golestani et al., 2009) there was a significant language by semantic relatedness interaction (F(1,14) = 7.24, p=0.018). There was no significant effect of relatedness and no other significant interactions. These results are illustrated in figure 5. Planned contrasts revealed better performance for semantically related versus non-related trials in French whereas in English participants performed better for non-related trials. Thus, the semantic relatedness of the prime and the target plays a facilitatory role in correctly recognising the target in the native language while recognition deteriorated in the non-native language.

24 Figure 5: (a) Performance in recognition task for each condition for all 15 subjects.

-10 -8 -6 -4 -2 0 2

0.75 0.8 0.85 0.9 0.95

1 english rel

english unrel french rel french unrel

French English

0.78 0.8 0.82 0.84 0.86 0.88

Percent Correct

Language

rel unrel

Percent correct

(a)

(b)

25 (b) Language by semantic relatedness interaction. In both graphs error bars indicate the standard error of the mean.

A 3-way repeated measures analysis of variance was also performed on response times for correct trials with language, SNR and semantic context as within-subjects factors. Here also results revealed a main effect of language (F(1,14) = 30.367, p<0.01), demonstrating that participants were faster for trials taking place in French than in English and a main effect of SNR (F(3,42) = 73.646, p<0.01), showing that responses were quicker for higher SNR levels.

There was also a significant effect of relatedness (F(1,14) = 9.078, p<0.01) and a significant language by relatedness interaction (F(1,14) = 13.558, p<0.01). There were no other significant two or three-way interactions. Response times results are illustrated in figure 6. Post-hoc Bonferroni-corrected comparisons revealed that participants were significantly faster at recognising targets followed by a semantically related compared to unrelated prime in French while there was no significant difference between semantically related and unrelated word pairs in English.

Figure 6: Latencies for correct responses as a function of SNR, language and semantic context for all 15 subjects. Error bars represent the standard error of the mean.

-10 -8 -6 -4 -2 0 2

650 700 750 800 850 900 950

english rel english unrel french rel french unrel

RT

SNR

26 To sum up, behavioural results demonstrated that there is a significant effect of semantic context which is facilitatory in the native language but detrimental in the non-native language.

The next step is to look at the imaging results in order to see how the effects detected in the behavioural level are reflected in the brain.

Imaging results

Unless otherwise stated, all imaging results reported in this section are significant at a whole-brain family-wise error (FWE) corrected level of p<0.01. Coordinates of peak

activations are in the MNI (Montreal Neurological Institute) space.

The random effects analysis of variance revealed SNR-related changes to activations in a widespread temporal, parietal and frontal network (see Table 1). This network included the angular gyrus, the middle temporal gyrus (MTG), the precuneus, the frontal operculum, the insulae and the supplementary motor area (SMA) bilaterally, as well as activation of the left premotor cortex (see figures 7 and 9).

Table 1: Main effects of SNR, presented at a family-wise error (FWE) of p<0.01. The

grey-Structure cluster size x y z Z score

left precuneus 1151 -38 -74 36 inf.

left anterior insula 430 -30 22 0 inf.

right angular gyrus 966 44 -70 36 inf.

left supplementary motor area 487 0 20 46 7.406

left precuneus 1759 -4 -54 26 7.324

left inferior frontal gyrus 816 -42 16 22 7.078

left middle frontal gyrus 202 -26 28 40 6.817

left middle frontal gyrus 870 -4 58 8 6.699

right anterior insula 150 34 24 -2 6.477

right middle frontal gyrus 36 28 32 42 5.558

left middle temporal gyrus 12 -64 -14 -18 5.422

right middle temporal gyrus 10 66 -14 -18 5.327

left inferior frontal operculum 1 -54 12 10 5.172

right calcarine sulcus 7 22 -98 -2 5.158

right supramarginal gyrus 3 62 -24 32 5.145

left dorsal anterior cingulate area 7 -8 34 -8 5.14

left superior middle frontal gyrus 2 -2 56 34 5.028

left superior middle frontal gyrus 1 0 58 32 5.008

27 shaded rows correspond to brain regions whose activation negatively correlated with SNR while the non-shaded rows indicate brain regions which showed greater activity for clear than masked words. This table was extracted from SPM and shows peak voxels separated by more than 8mm.

Careful examination of the data allowed us to decompose this network into two distinct ones, according to their activation as a function of whether the target word was

masked with noise or not. Deactivation for low SNR values (i.e. masked words) was observed bilaterally in angular gyrus, MTG and precuneus and left frontal operculum. Thus, these regions were more active when the target was clear and, hence, easier to understand, but they were deactivated when there was noise in the signal. Precuneus, angular gyrus and frontal operculum are known to be part of a default mode network (Raichle et al., 2000) while MTG is a speech-processing region (e.g. Vouloumanos, Kiehl, Werkrt & Liddle, 2001; Davis &

Johnsrude, 2003). Left angular gyrus is also involved in reading (e.g. Joubert et al., 2004) and it was shown to be active in speech tasks in which linguistic context assists comprehension under adverse listening conditions (e.g. Obleser et al., 2007; Obleser & Kotz, 2010). The activation of this resting state and speech related network during the easy conditions suggests that clear speech can be successfully understood without the recruitment of additional brain regions that might be necessary for effortful speech comprehension.

On the other hand, for lower SNR levels (i.e. masked targets) activation was observed in left premotor cortex, left posterior inferior frontal gyrus (Broca’s area), the insulae

bilaterally and supplementary motor area (SMA) bilaterally. This motor network is more active for words embedded in noise, suggesting that it is involved in effortful speech comprehension.

The activation patterns of some of these regions as a function of SNR are illustrated in figure 8.

28

x=-4 x=-56

x=-46

x=-36

x=-26

x=-10

x=+56

x=+46

x=+36

x=+26

x=+10

correlation with noise correlation with signal

Figure 7: Brain regions showing an effect of SNR displayed on the MNI standard brain at a family-wise error (FWE) corrected level of p<0.01. Regions in which activation positively correlated with noise are represented in red.

Regions that showed a positive correlation with signal are represented in blue.

29 Figure 8: Mean effect size relative to rest for responses to different SNR levels. Error bars show standard error of the mean after between-subjects variance has been removed, suitable for repeated-measures comparisons (Loftus & Masson, 1994).

Figure 9: The noise-correlated (in red) and the signal-correlated (in blue) networks,

mean parameter estimate for condition vs baseline

SNR (dB)

mean parameter estimate for condition vs baseline

SNR (dB)

mean parameter estimate for condition vs baseline

SNR (dB)

mean parameter estimate for condition vs baseline

SNR (dB)

(D) Left supplementary motor area (C) Left anterior insula

30 indicate the left and right hemisphere respectively.

Despite the significant effect of language and the significant language by relatedness interaction in the behavioural data, there was no significant effect of these factors on brain activity.

Discussion

In this study, an auditory retroactive semantic priming paradigm was used in order to investigate the neural correlates of intelligibility and contextual information in speech processing. A previous study (Golestani et al., 2009) using this paradigm revealed that individuals can efficiently use contextual semantic information in their native language whereas in their non-native language the presence of semantic associations has a negative

In this study, an auditory retroactive semantic priming paradigm was used in order to investigate the neural correlates of intelligibility and contextual information in speech processing. A previous study (Golestani et al., 2009) using this paradigm revealed that individuals can efficiently use contextual semantic information in their native language whereas in their non-native language the presence of semantic associations has a negative

Documents relatifs