• Aucun résultat trouvé

Listening to speech in noise in a non-native language

Dans le document Prediction in Interpreting (Page 177-190)

5 Study 3. Prediction during consecutive interpreting in noise

5.1.4 Listening to speech in noise in a non-native language

The potential interaction between degradation during signal transmission (noisy speech) and receiver limitations (listening to speech in L2) is relevant when considering how consecutive interpreters comprehend speech in noise. Consecutive interpreters, too, often work from their L2 into their L1. There is evidence to suggest that noisy speech may affect

L2 comprehension more, and differently, from L1 comprehension (for a review, see Lecumberri, Cooke, & Cutler, 2010).

Mayo, Florentine and Buus (1997) administered a speech perception in noise (SPIN) test, in which participants listen to sentences and write down their final word, to

participants listening in either their native or their non-native language (with the latter group divided into early and late bilinguals). Half of the sentences were constraining for the final word, so that listeners could use top-down processing to help them determine the final word (e.g., “The boat sailed across the bay.”), and half of sentences were non-constraining (e.g., “John was thinking about the bay.”). In the first part of the experiment, sentences were presented at 70dB without noise, and only listeners who correctly determined the final word 96% of the time were included in the second and third parts of the experiment. In the second part of the experiment, sentences were presented at 70dB in babble noise. The level of the babble noise was initially set to 55 dB, and was increased by 5dB for every correct response. For each incorrect response, the noise level was reduced by 2dB. In the third part of the experiment, listeners were tested at five to seven different SNRs. These latter two parts of the experiment were designed to investigate at what noise level

participants could still repeat the target words with 50% accuracy. Monolinguals and early bilinguals maintained 50% accuracy at a significantly lower SNR than late bilinguals.

Furthermore, for late bilinguals there was no difference between listening to constraining or non-constraining sentences in noise. Thus, in a noisy setting, late bilinguals appeared less able to rely on contextual information than monolinguals and early bilinguals.

There is some evidence to suggest that the non-native disadvantage in noise is due to less effective use of phonotactic and contextual knowledge (Lecumberri et al., 2010).

Krizman, Bradlow, Lam and Kraus (2017) tested perception of sentences in noise, words in noise and tones in noise in monolingual and early high proficiency bilingual adolescents.

They found that monolinguals were significantly better than bilinguals at sentence

perception in noise, whereas there was no significant group difference for word perception in noise, and bilingual adolescents were better at tone perception in noise. This suggests that the monolingual perception advantage is due to top-down language-dependent processes, and that this monolingual advantage remains even compared to early, high proficiency bilinguals (see also Florentine, 1985).

Meanwhile, Rammell, Cheng, Pisoni & Newman (2019) considered the neural correlates of English-Spanish late bilinguals when they listened to speech in either their L1 or their L2. They found that auditory language processing regions were activated when participants listened in their L2. This suggests that when listening in L2, participants focused on the bottom-up processing of the speech signal at the perceptual level. On the other hand, when the same participants listened to speech in noise in their L1, the authors found activation in regions linked to executive functioning. This is suggestive of top-down language processing. The different activation patterns for listening to speech in noise in L1 and L2 suggest that speech in noise is processed differently depending on whether it is presented in L1 or L2. More specifically, listening to noisy speech in L1 appears to rely more on top-down processing (and potentially predictive processing), whereas listening to noisy speech in L2 appears to rely more on bottom-up processing.

The general pattern is therefore that noisy speech has a more detrimental effect on non-native than on native language perception. This may be because bilinguals are less able to use higher-level, linguistic cues in order to generate predictions about upcoming words

and utterances (Golestani, Rosen, & Scott, 2009). Alternatively, it may be because L2 listeners have more difficulty than L1 listeners at each stage in language comprehension, and that noise thus has a cumulative negative effect on speech perception in L2 listeners.

In sum, it is unclear whether L2 speakers listening to noisy speech will generate predictions about upcoming words. On the one hand, noise may increase reliance on top-down processing. On the other, L2 speakers may not be able to make effective use of contextual information in order to generate predictions.

5.2 The current study

We investigated whether, against a noisy background, prediction in L2 took place, and whether a consecutive interpreting task influenced predictive processing during comprehension. We tested high-proficiency bilinguals untrained in interpreting, in order to consider the general processes of comprehension and production that underlie consecutive interpreting, rather than trained strategies. We considered whether a consecutive

interpreting task, as compared to a listening task, increased predictive processing in noise in high-proficiency bilinguals.

Our design is similar to that of Study 1. Participants listened to highly constraining sentences, and viewed a visual array containing three distractors and an image of a critical word, which was one of either the predictable word (e.g., mouth), a word phonologically related to the English form of the predictable word (e.g., mouse), or an unrelated word.

(e.g., bone). In this study, participants either listened to the sentences and subsequently provided their interpretation into Dutch and answered a comprehension question, or else listened to the sentences and answered a comprehension question. Sentences were presented on a noisy background.

We expected that participants would fixate more on the image representing the predictable word than on the image representing the unrelated word, as we expected that high-proficiency L2 speakers would form semantic predictions even in noise. However, we considered that a consecutive interpreting task might affect the timing and magnitude of the difference in fixation proportions between the two conditions. In addition, we

considered that high proficiency L2 speakers might predict word-form. In our first study (see Chapter 3), we found no evidence for word-form prediction when participants were

engaged in a concurrent production task (simultaneous interpreting). However, word-form prediction might take place in high-proficiency L2 speakers in a situation in which prediction is advantageous for language processing efficiency, but in which there is no concurrent production which might impede prediction; in other words, in a consecutive interpreting task.

5.3 Methods

5.3.1 Participants

Twenty-six Masters students at the Faculty of Translation, Interpreting and Communication at Ghent University participated in the experiment. One participant was excluded because they fixated the visual stimuli less than 3% of the time in the consecutive task. One additional participant was tested but excluded at random because the experiment was fully counterbalanced with 24 students.

All participants completed a language background questionnaire, based on the Leap-Q questionnaire (Marian et al., 2007). Unlike in the original questionnaire, participants were asked to provide language background information only on Dutch and English (rather than on all of their languages). Self-proficiency ratings were made on a 10-point scale. Table 1 summarises the information provided by participants in the Leap-Q questionnaire.

Participants also provided information about their University course. Thirteen

students were from the Interpreting Department and had received between 1 and 2 months of training in consecutive interpreting at the time of testing. Two further students from the Interpreting Department had received 10 months of training in consecutive interpreting.

The other nine students came from the Translation Department, or the Multilingual Communication Department, and had received no training in consecutive interpreting. All students were native speakers of Dutch and were highly proficient users of English. All participants had received training in translation during their undergraduate degree, with the exception of two participants.

Participants (n24)

Background

Age (yrs) 22.21 ± 1.18

Languages spoken 4.50 ± 0.83

Dutch language

Age (yrs) of Acquisition 1.29 ± 1.08

Age (yrs) at which became fluent 4.41 ± 2.50 Time (yrs) living in Dutch-speaking area 22.05 ± 1.28

Current exposure % 60.83 ± 11.95

English language

Age (yrs) of Acquisition 10.30 ± 3.30

Age (yrs) at which became fluent 16.13 ± 2.40 Time (mths) living in English-speaking area 0.72 ± 1.56*

Current exposure % 18.00 ± 6.33

Self-rated English proficiency

Speaking 7.96 ± 0.95

Reading 8.33 ± 0.92

Listening 8.42 ± 0.88

Table 1. Language background and self-rating data from Leap-Q questionnaire

* One participant reported living in an English-speaking environment for 23 years. As this was an outlier, this participant was excluded from this calculation. The participant was still included in the experiment, as this was clearly a reporting error: the participant did not state having lived in an English-speaking country for any period of 6 months or longer.

In addition, participants completed a LexTALE test at www.lextale.com (Lemhöfer &

Broersma, 2012). The average score was 87.34% (SD: 9.08%, Range: 68.75% - 98.75%). This test is designed for medium to highly proficient speakers of English as a second language, and results have been shown to correlate well with measures of English vocabulary knowledge and general English language proficiency. Advanced users of English would be expected to score between 80 and 100%.

5.3.2 Stimuli

Stimuli were adapted from Study 1. Experimental stimuli consisted of 30 English sentences, each paired with one of three visual arrays(see Appendix 4). The experimental sentences each contained a highly predictable word (e.g., mouth in “The dentist asked the man to open his mouth a little wider.”) at varied positions in the sentence (range: 5 – 16th word) but never sentence finally. The experimental sentences consisted of a mean of 15.6 words (range: 10 – 21, SD: 2.79). In addition, there were 30 filler sentences.

The sentences were recorded by a male native Southern British English speaker and sampled at 48 kHz. The speaker read the experimental sentences at a rate of approximately 2.05 syllables per second (SD = 0.24). The average sentence duration was 9.90 seconds (SD = 1.68). The onset of the critical word was, on average, at 6.26 seconds (SD: 2.00).

The intensity of all sentences was set to 70 dB using Praat, and speech-shaped sound was added to the sentences at a SNR of 0 dB. 0 dB was chosen because at an SNR of 0 dB, sentences can easily be understood, although isolated words may be less intelligible (Bronkhorst, 2015). At lower SNRs (below 0 dB), only some parts of the speech signal (e.g., harmonics, formants and strong frication) are likely to be heard over the noise (Lecumberri et al., 2010) and listeners may start to rely on acoustic-phonetic cues for word segmentation

because other cues are not available (Mattys et al., 2009). We did not want to impede top-down processes due to an excessively noisy speech signal.

We used speech-shaped sound rather than the (multi-speaker) babble that might be more typically associated with a consecutive interpreting setting. This was because maskers such as multi-speaker babble allow listeners to catch glimpses of unobstructed speech depending on masker modulations, whereas speech-shaped sound provides a constant and stationary masker (Lecumberri et al., 2010), while covering the same frequency spectrum as multi-speaker babble. This was important in creating the same conditions in the different trials in our experiment (although there were still modulations due to the target signal varying and therefore sometimes being more intelligible in the speech-shaped sound).

Speech-shaped sound used was created using multi-speaker babble from the ICRA database at http://www.icra-audiology.org (Dreschler, Verschuure, Ludvigsen, &

Westermann, 2001). ICRA 7 from the ICRA database was demodulated using a MatLab script, resulting in a constant noise with the same frequency spectrum as 6-speaker babble.

This speech-shaped sound was then added to all sentences, with sentences and noise cut at the nearest zero crossing using Praat. The volume at which stimuli were presented was held constant across participants and trials.

The predictability of the target words was normed by a group of undergraduate native speakers of Dutch, using a cloze probability test. Cloze probability was evaluated online. Twenty first-year undergraduate Dutch native speakers who spoke English as a second language read the sentences truncated before a predictable word and provided the word in English. Cloze probabilities were 84.3% on average (SD: 17.21%, Range: 45 – 100%) (see Table 2).

Each of the visual arrays contained four objects: a critical object and three distractor objects. In the target condition, the critical object corresponded to the predictable word (e.g., mouth). In the English competitor condition, the English name of the critical object phonologically overlapped at onset with the predictable word (e.g., mouse). The mean number of phonemes shared between the predictable words and the English competitor words was 2.2 (SD 0.55) out of a total of 3.6 (61%). In the unrelated condition, the onset of the name of the critical object did not overlap with the English name of the object.

We conducted a picture naming test to assess naming agreement for the depicted objects among Dutch speakers of English. This test was carried out online by Dutch students who spoke English, and who did not participate in the cloze naming test. Each object was rated by at least 12 individuals. When words provided had the same meaning and the same phonological onset, they were counted as the same word (e.g., life jacket/life vest). Two items were not included in the norming study due to an error (fish/finger). The naming agreement for objects in the target condition was 90.8% (SD: 16.7%, Range: 16.7% - 100%), 81.6% for objects in the English competitor condition (SD: 23.5%, Range: 8.3% - 100%), 84.2% for objects in the unrelated condition (SD: 21.7%, Range: 16.7% - 100%) and 84.3%

for the distractor objects (SD: 21.4%, Range: 8.3% - 100%).

Sentence

cloze rating in % Naming agreement (Target) in %

Naming agreement

(Competitor) in % Naming agreement (Unrelated) in %

84.3 90.8 81.6 84.2

Table 2. Results of norming studies. Please see Appendix 4 for a full list of stimuli with all ratings.

The study used 30 experimental and 30 filler sentences, as well as 30 matched versions of the visual stimuli, shown once with an experimental and once with a filler

sentence. The matched stimuli showed the same four images, but the quadrants in which the images appeared were varied. Three sets of experimental items were compiled, in which experimental conditions were counterbalanced. These three sets of items were then each divided into two half lists. The matched visual stimuli were assigned to different half lists, meaning that in one half list the visual stimuli appeared with an experimental sentence, and in the other half list it appeared with a filler sentence. The half lists were then recombined in two different orders to create 6 full lists. In some instances, a word that was a

phonological competitor or target object also appeared in another sentence. In these instances, the image of the critical object was always shown before the sentence mentioning its name in another context. The half lists were then pseudo-randomized.

Critical objects appeared in each condition in each of the four quadrants equally frequently over each of the three sets of experimental items. Filler sentences mentioned distractor objects 66.7% of the time, so that, together with the experimental objects which mentioned an object present 33.3% of the time, the sentences mentioned one of the objects in the visual array 50% of the time. This was the case in each half list.

5.3.3 Procedure

The experiment started with a picture familiarization task in which participants viewed all objects appearing in the experiment in a randomized order. The English names of the object were presented both visually and auditorily and were recorded by the same speaker in the same way as the experimental sentences. Participants then saw the objects a second time and repeated their names in English.

Participants were instructed to look at and listen to the names given to the objects, so that they could name the objects using the same words later. Then they were asked to

name each object using the words that had been provided. The mean picture naming accuracy was 97.6% (SD: 5.8%). Incorrectly named objects were repeated, and the

experimenter prompted participants who did not provide the correct name for the object on second viewing.

In the eye-tracking experiment, participants were seated in front of a computer screen set up in the EyeLink+ laboratory at Ghent University. The participant’s dominant eye was tracked. An SR Research Eyelink 1000 Plusâ desktop-mounted eye-tracker was used in the remote mode. Auditory stimuli were presented at a constant volume through a pair of Sennheiser HD 215 over-ear headphones.

The experiment was divided into two halves. In one half, participants listened to sentences and then answered the question “Did the sentence mention any of the pictures?”. In the other half, participants listened to the sentences, consecutively

interpreted the sentences into Dutch, and then answered the same question. Task order was counterbalanced across participants. After viewing the instructions, the eye-tracker was calibrated using the nine-point calibration grid. Each half of the experiment started with two practice trials, after which participants were given a chance to ask questions, and the eye-tracker was re-calibrated before participants began. Pictures were presented on a viewing monitor at a resolution of 1024 x 768 pixels. Each trial started with a drift correction, followed by a 500ms blank screen. The visual array was presented 1000ms before onset of the predictable word in experimental trials. On filler trials, the presentation was 1000ms before the onset of a word that referred to a distractor, or else at a random mid-sentence point if the sentence did not mention anything in the array. The picture stayed on the screen until offset of the spoken sentence. In the consecutive task, a blank screen was then

shown for 12000ms in order to give participants time to interpret the sentence. In the listening only task, a blank screen was shown for 1000ms. After this, participants judged whether one of the objects shown on the screen had been mentioned in the sentence. They answered yes or no using keys 1 or 2 respectively on their keyboard.

Figure 1. Example visual stimuli for the Target, English competitor and Unrelated conditions. The critical image appears in the top left-hand corner (mouth, mouse, bone). The position of the critical image was rotated on different items throughout the experiment, so that the critical image appeared equally frequently in each quadrant.

5.4 Results

5.4.1 Comprehension question accuracy

The mean accuracy in the comprehension questions over both tasks was 98.6% (SD:

2.3%). This broke down evenly across the listening and the consecutive tasks, with response accuracies of 98.6% (SD: 3.2%) and 98.6% (SD: 3.8%) respectively.

Dans le document Prediction in Interpreting (Page 177-190)