• Aucun résultat trouvé

General Speech Perception

1.1 Speech Perception

1.1.1 General Speech Perception

Generally speaking, understanding spoken language may be described as a process of extracting acoustic cues from the speech signal and mapping them onto stored linguistic representations. Difficulties in doing so may stem mainly from two sources: the language itself or the speaker. The language itself may pose problems because an inexperienced listener may not know how to interpret language-specific cues, or problems may arise from a listener’s long-term phonological representations in the L2 differing from those of a native speaker. Difficulties may be incurred by the speaker if he/she has produced such a degraded acoustic signal it is unintelligible to the listener. Impediments to perception related to language or variety-specific properties are clearly more of an obstacle to non-natives, while speaker-related variations may be detrimental for any listener depending on how extreme they are.

Language as a system exploits the use of contrasting sounds strung together to form words, which grouped in particular orders communicate a larger message. As the contrasting sounds utilized vary from language to language, it is logical that a listener, be it a baby learning its mother tongue or a student learning a second language, must learn which contrasts are important in the language (i.e., the phoneme inventory along with possible allophonic realizations) and how they are indicated in that language. It is generally held that the sounds of language are replete with multiple, redundant phonetic cues (duration, spectral features, periodicity, noise, intensity, etc.), which contribute to making verbal communication a maximally efficient task. If some cues to contrast are lost in degraded speaking conditions or reduced speech or they are somehow not picked

CH. 1.1 SPEECH PERCEPTION 5 up by the listener, others serve as back-ups to secure transmission of the utterance (Pisoni & Lively, 1995).

Cues and their hierarchy of relevance as to segments’ identities differ cross-linguistically; therefore, the cues that speakers attend to, or prioritize (often called cue-weighting), make for a relevant issue to speech perception and language acquisition.

For example, Iverson et al (2003) examined acoustic cue use in English /ra/ and /la/ categorization by Japanese, German and American adults and found that Germans were highly sensitive to the same cues that Americans were, namely differences in F3, while Japanese listeners showed greatest sensitivity to F2, which is not used by natives to discriminate this contrast. Other research involving vowel discrimination has shown that in differentiating between American English /i/ (tense) and // (lax), Mandarin Chinese learners rely on temporal (durational) differences rather than spectral information (Bohn, 1995; Flege, Bohn, & Jang, 1997), while natives were shown to rely on both (Bohn & Flege, 1990). Related research on Catalan perception of a General American English /i/ (tense) - // (lax) - /e/ (lax) continuum (Cebrian, 2006) evidenced that Catalan speakers also routinely depended on temporal cues in contrast to natives1. Thus, differences in phoneme perception between L1 and L2 listeners may be explained by L2 listeners not prioritizing cues to contrast in the same way as L1 listeners or they may be processing L2 speech through a “filter,” using, most likely, an L1-appropriate hierarchy of cues when possible. Hence, perceptual attunement to the L2 may be said to require language-specific prioritization of acoustic cues.

It may be expected that learning to perceive contrasts in a second language will take time; but, when learning a language, not all feature contrasts are necessarily unfamiliar. Some features may be familiar, but not occur in the same segmental context

1 In this study however, in slight contrast to the early work by Bohn & Flege on American English speakers, Canadian English speakers were seen to rely most on vowel quality (spectral) differences.

CH. 1.1 SPEECH PERCEPTION 6 as in the L1, thus requiring active (re)assessment of the implications of the contrast, which also takes time. For example, English-speaking students required five years of Hindi instruction to learn a new L2 contrast—the unfamiliar dental-retroflex place contrast—and up to a full year to correctly perceive certain L2 voicing contrasts (breathy vs. voiceless aspirated stops) (Tees & Werker, 1984). Though voicing contrasts exist in English obstruents, students needed time to correctly perceive breathy and voiceless aspirated stops.

One way to improve perception is by providing variability in the training of unfamiliar contrasts. It appears that listeners encode very detailed talker-specific information in long term memory, which later helps them to recognize novel words spoken by familiar voices and which may be used to generalize to unfamiliar voices.

Findings in this area (Pisoni and Lively, 1995; Bradlow, Akahane-Yamada, Pisoni, and Tohkura, 1999) demonstrate learners’ long-term retention of L2 phonetic categories and support that the more varied the perceptual input during training, the more successful learners may be.

But sometimes mere exposure is not enough. Experience, that is, the combination of exposure and active use, makes the difference. Indeed, a number of studies have shown that greater experience yields better discrimination of L2 sound contrasts. A study testing Chinese ESL learners’ identification of English /b, d, , p, t, k/ in final position, a context where obstruents do not usually occur in Chinese, thus yielding a new L2 contrast, found that participants with a longer length of residence (LOR) in an English-speaking environment achieved higher, but not nativelike, scores than those with a shorter length of residence. And, more importantly, participants with a non-student worker status achieved higher scores than those with student status (Flege

& Liu, 2001). Non-student workers presumably were required to actively use their

CH. 1.1 SPEECH PERCEPTION 7 English, likely in a variety of contexts, while students may have had more passive language contact: attending class lectures likely requires more listening than speaking, as the authors suggest that performance in this study was linked to how substantial the learners’ input was over time. Thus, though length of stay, or exposure, in an L2-speaking environment can improve perception, it is active language-use that is critically important. The importance of language-use is further supported by a study on the unfamiliar [-l] English contrast with Japanese ESL learners which found that students with greater experience as a result of intensive conversational-type instruction were better at perceptually distinguishing the contrast than those students with less experience (MacKain, Best & Strange, 1981).

Understanding a language’s basic sound contrasts and the variation in the realization of phonemes that may occur due to context or other factors is basic for non-native listeners. There are many other features however that differ from language to language that learners must be aware of. Phonotactics, or permissible strings of sounds in a language, plays an important role because it limits which phonemes may appear in the company of others. Awareness of transitional frequency, or the probability of sounds following one another, and prosodic factors, among others, is also significant.

Both phonotactics and transitional frequency will be considered further in relation to prelexical accounts of segmentation in section 1.3.2.1.

Speakers exert great control over the way they talk including the amount of effort expended on producing clear speech, which depends on the listener and the context. As a favor for the listener, speakers may manipulate speech in regard to:

structural constraints, for example by using predictable over marked word order;

prosody, by highlighting new or difficult information with emphatic stress for example;

their speech rate and volume; as well as vocabulary, by using more rather than less

CH. 1.1 SPEECH PERCEPTION 8 frequent words, when they speak. In addition to these choices, speakers may also manipulate phonetic reduction. Across speech rates and contexts, it has been found that speakers will show less reduction for words that are less predictable from the sentential context than for the same words in a more predictable context (Lieberman, 1963).

Frequency may also affect speakers’ amount of reduction on particular lemmas, that is, words that have the same gloss in the dictionary but have different meanings or grammatical functions. For example, Jurafsky, Bell & Girand, (2002), using a corpus-based methodology with a focus on lexical frequency effects, found that the phonologically-realized form of the different lemmas of TO (as an infinitive marker vs.

a preposition/particle), THAT (as a pronoun, complement, relative pronoun, determiner, or part of an idiom vs. an intensifier), and OF (as the complement of a verb/preposition vs. a partitive) varied as a function of frequency. Along the same lines, Johnson (2007), using the VIC Corpus (Variation in Conversation) (Pitt et al, 1995) reports differences in the pronunciation of homophones (e.g., RIGHT vs. WRITE) also related to frequency.

Therefore, there exist a number of features of speech, both language-dependent, speaker-dependent, and context-dependent, that listeners, both native and non-native, must deal with in order to analyze and understand the acoustic signal.

CH. 1.1 SPEECH PERCEPTION 9