• Aucun résultat trouvé

The complex articulation of English

2.8 Acoustic properties

Despite the diversity of possible tongue shapes observed for post-alveolar /r/, the acoustic profile of these different tongue configurations is remarkably indistinguishable, at least with regards to the first three formants (Espy-Wilson et al., 2000). It is generally agreed that the most salientacoustic feature for/r/is its low third formant (F3) value, usually below 2 000 Hz (Boyce

& Espy-Wilson, 1997; Delattre & Freeman, 1968; Proctor et al., 2019) and some researchers have remarked on the close proximity of F3 to F2 (Dalston, 1975; Guenther et al., 1999; Lisker, 1957; O’Connor, Gerstman, Liberman, Delattre, & Cooper, 1957; Stevens, 1998). An alternative account suggests that the percept of/r/is defined not by F3, but by a single dominant peak in the F2 frequency region (Heselwood & Plug, 2011). Formant values fromAmerican English /r/reported in the literature across tongue shapes, phonetic contexts and sexes range from 300-500 Hz for F1, 900-1 300 Hz for F2, and 1 300-2 000 Hz for F3 (Delattre & Freeman, 1968;

Espy-Wilson, 1992; Espy-Wilson & Boyce, 1999; Uldall, 1958; Westbury et al., 1998; Zhou et al., 2008). InrhoticEnglishes, prevocalic/r/presents lower formant values than postvocalic/r/, which is generally assumed to be the result of the presence of lip rounding in prevocalic/r/

2.8. Acoustic properties 65

(Delattre & Freeman, 1968; Lehiste, 1962; Zawadzki & Kuehn, 1980). As far as we are aware, no study has observed systematic differences betweenretroflexandbunched/r/up to the third formant. However, beyond F3, Espy-Wilson and Boyce (1994) found that F3 and F4 are further apart forretroflexthan they are forbunched/r/. More recently, consistent acoustic differences have been found in the higher formants inAmerican English. Notably, the difference between F4 and F5 has been found to be larger inretroflexthan inbunched/r/. Zhou et al. (2008) found thatretroflex/r/inAmerican Englishmales showed a difference between F4 and F5 of over 1 400 Hz compared with 700 Hz forretroflex/r/. This result has since been replicated in studies on postvocalic/r/in Scottish English (Lawson, Stuart-Smith, & Scobbie, 2018; Lennon, Smith,

& Stuart-Smith, 2015).

A variety of attempts have been made to account for the acoustics of English/r/, particularly with regards to the maintenance of the low F3 values observed across a multitude of articulatory configurations. Accounts for the source of the low F3 associated with/r/have been proposed using both Perturbation Theory (e.g., Johnson, 2012; Ohala, 1985) and multi-tube models (e.g.

Alwan et al., 1997; Espy-Wilson et al., 2000; Stevens, 1998) with varying degrees of success.

Perturbation Theory relates vocal tract constrictions to formant frequencies by accounting for perturbations to a uniform, unconstricted tube, where one end is closed and the other end is open (i.e., a quarter-wavelength resonator). Perturbation Theory states that if you constrict the tube at a place along its length where there is a point of maximum velocity (or zero pressure), i.e., at the location of an antinode, the frequency of the corresponding resonance will fall. Conversely, if you constrict a tube at a place along its length where there is a point of maximum pressure (or zero velocity), i.e., at the location of a node, the frequency of the corresponding resonance will rise (Chiba & Kajiyama, 1941). Perturbation Theory predicts the points of maximum velocity for F3 to occur in the pharyngeal, palatal and labial regions, which, according to Johnson (2012) ‘nicely illustrates’ the utility of Perturbation Theory in that a combination of all three constrictions are used for English/r/. Perturbation Theory would thus predict that the source of the low F3 typical of/r/is a combination of all three constrictions, which is indicated by the distribution of antinodes for F3 inFigure 2.4. However,

Espy-Wilson et al. (2000) used area functions fromMRIdata to show that Perturbation Theory cannot adequately account for the actual constriction locations speakers use. For example, they found that the palatal constriction is actually located at a point of maximum pressure (i.e., at a node) and not maximum velocity (i.e., at an antinode), which, according to Perturbation Theory, would more likely raise F3 than lower it.

Figure 2.4:Locations of nodes and antinodes in a tube open at one end in the unconstricted vocal tract. Perturbation Theory predicts that a constriction at the location of an anitnode (labelledA) in the vocal tract would lower the frequency of the corresponding resonances. Nodes are indicated

by the intersections of the sine waves (adapted from Johnson, 2012, Figure 6.7).

2.8. Acoustic properties 67

Contrary to Perturbation Theory, multi-tube models consider the vocal tract to comprise of several tubes of different areas and lengths, and that the source of the different formants is the resonating frequency of the different tubes (Espy-Wilson et al., 2000). Multi-tube model accounts have affiliated the low F3 typical of/r/with the front cavity, i.e., between the palatal constriction and the lips. Stevens (1998) found that F3 results from a large front cavity volume for/r/, although he suggested that the various tongue configurations used for/r/do not lower F3 per se, but introduce an extra resonance, FR, in the frequency range normally occupied by F2 with a drop in amplitude of F3 proper. Based on speakers’ actual vocal tract dimensions derived fromMRIdata, Espy-Wilson et al. (2000) developed a multi-tube model to account for cavity affiliations for /r/. With regards to F3, their model confirmed that F3 is indeed a front cavity resonance, which includes a lip constriction formed by the tapering gradient of the teeth and lips – with or without rounding – and a large volume cavity behind it that includes asublingual space. They found that thissublingual spaceacts to increase the volume of the cavity and lowers F3 by approximately 200 Hz. Interestingly, while Perturbation Theory would predict that a constriction in the pharyngeal region would lower F3, Espy-Wilson et al. (2000)’s model indicates that eliminating the pharyngeal constriction has minimal effect on F3.

Physical models of the vocal tract have also indicated that the size of the front cavity has an influence on F3. Lindblom, Sundberg, Branderud, Djamshidpey, and Granqvist (2010) noted that despite the advances in articulatory-acoustic relations particularly as a result of work by Gunnar Fant, our understanding of vocal tract acoustics remains incomplete with respect to the treatment of lip spreading and of thesublingual space. As a result, they created a physical twin-tube model in order to model acoustics. Their results corroborate multi-tube models of /r/in that they too associate the front cavity with F3. When the volume of the front cavity is manipulated, all the while maintaining the lip opening area at a constant (1 cm2), the lowest F3 values are observed with the largest possible front cavity volumes. In essence, their physical model of the vocal tract shows that thesublingual spacecontributes to the overall area of the front cavity and that when the volume of the front cavity increases, F3 decreases. Interestingly, they observed an interaction between the size of thesublingual cavityand the degree of lip

spreading. The lowest possible F3 values occur with the lowest degree of spreading. However, the main acoustic correlate of spreading, according to their physical model, is F2: F2 increases as the lips become more spread.

The consistency in formant values observed for/r/has given rise to the suggestion that trading relationsmay exist between the different articulatory manoeuvres which reciprocally contribute to the lowering of F3. Dependence on one of these articulatory manoeuvres would be accompanied by less of another, and vice versa (Tiede et al., 2010). In an acoustic and articulatory study of the production of/r/in sevenAmerican Englishspeakers, Guenther et al.

(1999) observed systematic trade-offs between the length of the front cavity and the length and size of the constriction, which allowed speakers to maintain stable F3 values across different contexts of /r/. As a result, articulatory variability is juxtaposed with acoustic stability. Speakers modify the length of the front cavity and the length of the constriction in order to achieve the necessary total volume of the cavity which produces the low F3 typical of/r/(Matthies et al., 2008). The results from Guenther et al. (1999) therefore suggest that the target of speech production is acoustic in nature, as opposed to the traditional view, which would consider each phoneme to have a canonical vocal tract shape target, as Guenther et al. (1999) discussed.

Tongue shapes with a raised tongue tip create a cavity underneath the tongue blade, the sublingual space. Since the reported tongue shapes for/r/vary with respect to the elevation of the tongue tip, from tip down bunchedto curled up retroflex, it is likely that the size of thesublingual spacevaries across tongue shapes. Extremeretroflexshapes withsublaminal articulations would presumably have a larger sublingual space than apical ones, as briefly discussed in Espy-Wilson et al. (2000). Similarly, unlike tip up/r/, the tongue tip is down inbunched/r/and therefore has negligiblesublingual space(Zhang, Boyce, Espy-Wilson, &

Tiede, 2003). Indeed, Alwan et al. (1997) usedMRI- and Electropalatography (EPG)-derived vocal tract dimensions, and in oneAmerican Englishspeaker, the front cavity volume was larger forretroflexthanbunched/r/(6.1 cm3 and 4.5 cm3, respectively). This difference may be due to the smaller sublingual spaceinbunched/r/, although Alwan et al. (1997) did not explicitly make this suggestion. Trading relations involving thesublingual spacemay therefore