• Aucun résultat trouvé

CHAPTER I. AN INTRODUCTION TO PHONOLOGICAL VARIATION, FRENCH SCHWA,

I.2 Phonological variation

As mentioned above, words in connected speech exhibit considerable variation. Different types of variation have been described. A first distinction is usually drawn between predictable and unpredictable variation. Predictable or systematic variation relates to variation which follows contextual rules (e.g., Spinelli & Ferrand, 2005). This type of variation (referred to as “phonological variation” and found for instance in assimilation, French liaison, or French schwa alternation) is traditionally assumed to arise in the phonological component of the production process. In contrast, non-predictable variation is traditionally attributed to phonetic processes and arises as a consequence of differences between speakers (due for instance to differences in regional or socio-economic background, in anatomy or speaking habits) or may occur in the same speaker’s speech. It is indeed rather infrequent that a given speaker produces the same word twice identically, even within the same stretch of discourse. Types of speech, emotional status, speech rate, discursive importance or predictability of the word are some of the potential sources of intra-speaker variability.

In this work, we will be mostly interested in phonological variation. Phonological variation results in several pronunciation variants for a given word. These variants differ either in terms of the number of segments (through segment deletion or insertion) or in terms of the nature of a given segment (e.g., place of articulation, voicing, etc.). Many processes of phonological variation have been described in the literature, for several different languages.

Assimilation is an example of such a process. In assimilation, a given segment takes the acoustic properties of an adjacent segment in certain circumstances. For instance, in place assimilation in English, a coronal consonant may take the place of articulation of the following consonant when this consonant is a labial or velar consonant (e.g., wicked prank realized as [wkb pæk]). Similarly, in French, a process of voice assimilation may occur between two adjacent consonants of different voicing. For instance the word-final /b/ may become voiceless when followed by a voiceless consonant as in une robe sale ‘a dirty dress’, realized as [ynpsal].

8

Another example of a phonological variation process is flapping in American English.

Speakers often “flap” /t/s between two vowels, as in, for example, pretty, which gives us [pri] rather than [prti] (see Pitt, Dilley & Tat, submitted). Another common flapping phenomenon is nasal flap in American English. Nasal flap ([]) occurs optionally word-internally when a /nt/ cluster is found between a stressed and an unstressed vowel (Ranbom

& Connine, 2007). For instance, the word gentle can be realized with a nasal flap and, sounds like [dl].

Assimilation and flapping both result in the modification of some features of a given segment. Other processes of phonological variation lead to pronunciation variants differing in terms of the presence/absence of a segment. French schwa alternation is an example of such a process. Schwa alternation is a well-known process not only in French, but also in English, where an unstressed vowel can be optionally deleted in two different environments.

A schwa can be deleted in the initial syllable of polysyllabic words when preceded by a consonant and followed by a stressed syllable (pre-stress schwa deletion, e.g., potato, realized as [ptet]). A schwa may also be deleted in the second syllable of trisyllabic words, when the syllable containing the schwa is preceded by a stressed syllable (post-stress schwa deletion, e.g., envelope, realized as [envlp], see for instance Patterson, LoCasto &

Connine, 2003).

Similarly, in Dutch, two processes involving schwa have been described (e.g., Booij, 1995).

In schwa epenthesis, a schwa may be inserted in non-homorganic consonant clusters (i.e., clusters whose consonants have different places of articulation) in coda positions (e.g., elf

‘eleven’, realized as [lf]). On the other hand, schwa deletion occurs when a Dutch word contains two consecutive syllables with a schwa. In such cases, the schwa in the first syllable can be deleted provided that the resulting consonant cluster is of the type “obstruent + liquid” (e.g., soepele ‘smooth’ realized as [supl]).

Another example of a phonological process leading to two variants characterized by the presence/absence of a segment is the liaison process in French. Liaison in French refers to the appearance of a consonant between two words when the second word starts with a vowel (Nguyen, Wauquier-Gravelines, Lancia & Tuller, 2007). For instance, the sequence grand ours ‘big bear’ is usually realized as [!atus], i.e., a [t] is inserted at the junction of the two words.

9

As a consequence of these phonological variation processes, the production of words in connected speech is highly variable. This variability raises crucial questions for several research areas. These questions are discussed below.

I.2.1 Phonological variation and automatic speech processing

Pronunciation variation is one of the stumbling blocks of automatic speech processing (ASP), be it for automatic speech recognition or text to speech systems. Strik (1999) underlines that modeling pronunciation variation, whatever its type, has proved to reduce word error rates in automatic speech recognition to sometimes as much as 20%. However, he also stresses that “the right solutions have not been found yet (...) more fundamental research is needed to unravel what kind of pronunciation variation is present in spontaneous speech, both in qualitative and quantitative terms”.

Phonological variation is also likely to interfere with the performances of text to speech systems. Most only produce one single phonetic realization for a given word or sequence (Boula de Mareüil, 2007). In some cases, very simple linguistic rules are applied to decide which of two variants of a given word is produced, which means that some words are always realized in their variant form and others in their canonical form. This leads to potentially artificial outputs (Boula de Mareüil, 2007). Clearly, as in automatic speech recognition, the integration of more linguistic knowledge is necessary to produce natural realizations (Lacheret-Dujour, 1991).

I.2.2 Phonological variation and speech recognition / production processes

Speech recognition involves the mapping of acoustic-phonetic features extracted from the incoming signal onto a lexical representation. The question researchers have been trying to answer is how this mapping can take place despite the radical changes caused by phonological variation in the incoming signal. Several proposals have been made. Within the framework of generative phonology, in which it is assumed that each word is represented by one single abstract representation, some authors have proposed to account for the recognition of phonological variants through the existence of phonological inference processes (or compensation). The recognition system takes account of the phonological context in order to recover the underlying representation of the intended word that is behind the realized variant.

For instance, Gaskell and Marlsen-Wilson (1996) provide evidence that listeners infer the

10

underlying representation (i.e., the non-assimilated form) when presented with completely assimilated forms in appropriate contexts (i.e., contexts which license the assimilation, see also Gaskell & Snoeren, 2008). A modelization of this process is found in Gaskell and colleagues (Gaskell, Hare & Marslen-Wilson, 1995; see also Gaskell, 2003). Their connectionist model learns to compensate for assimilation by being exposed to the conditions in which assimilation occurs. Recent research on voice assimilation in French also shows that listeners recover the words behind assimilated forms (Snoeren et al., 2008).

In the same vein, Spinelli and Gros-Balthazard’s (2007) results on French schwa suggest that the recognition of non-schwa variants involves the restoration of the underlying form (i.e., the schwa variant).

An alternative view within the generative framework is taken by Lahiri and colleagues (Lahiri & Marlsen-Wilson, 1991; Lahiri & Reetz, 2002). While lexical representations are still assumed to be abstract and unique, they may also be underspecified for some features.

For instance, in order to account for assimilation of coronal consonants, it is assumed that coronal segments are lexically underspecified for place of articulation.

A radically different approach is adopted by exemplarist models of speech perception (e.g., Goldinger, 1998; Johnson, 1997). These models assume that the lexicon contains detailed lexical representations of each encountered token (exemplars). Contrary to abstract models, no normalization of the speech signal need occur. The incoming signal is simply mapped onto the corresponding representation. Phonological variation is dealt with like any other form of variation, with different exemplars for different variants.

More recently, several authors have argued that hybrid models, i.e., models involving the storage of abstract information and phonetic details in the lexicon, could better account for the recognition of phonological variants. For instance, Connine and Pinnow (2006) suggest a model that allows for several abstract phonological representations to be stored, one for each phonological variant (see also Ranbom & Connine, 2007). For a more extensive discussion on hybrid models of speech comprehension and evidence of the presence of both abstract information and phonetic details in the recognition lexicon, see Ernestus (submitted) or Nguyen, Wauquier-Gravelines and Tuller (in press).

Contrary to the recognition of phonological variants, which has received a lot of attention in psycholinguistic research during the last 15 years, the processes underlying the production of these variants have not yet been investigated with psycholinguistic on-line methods. Our current knowledge of how words are represented in the lexicon and encoded during

11

production comes from experiments using canonical word forms only. Existing psycholinguistic models of speech production that were conceived to account for these data cannot therefore, account for more natural forms of speech. What we know about phonological variants comes essentially from corpus analyses, acoustic and articulatory studies. While this latter research is crucial for describing the output of the production process, on-line experiments are needed in addition to inform us about the nature of the lexical representations of words with several variants, and about the mechanisms (including their time course) underlying the production of such variants. The aim of this work is precisely to provide such data, through the investigation of the phonological variants of French schwa words.While our observations will be restricted to this particular process, they will contribute to the construction of an overall knowledge about phonological variation processes in production. The following sub-section provides a short review of the linguistic literature related to French schwa.