• Aucun résultat trouvé

Analysis and synthesis of the American English lateral constant

N/A
N/A
Protected

Academic year: 2021

Partager "Analysis and synthesis of the American English lateral constant"

Copied!
75
0
0

Texte intégral

(1)

Analysis and Synthesis of the American English

Lateral Consonant

by

Adrienne Prahler

Submitted to the Department of Electrical Engineering and

Computer Science

in partial fulfillment of the requirements for the degree of

Master of Engineering in Electrical Engineering and Computer

Science

at the

MASSACHUSETTS INSTITUTE OF TECHNOLOGY

June 1998

@ Adrienne Prahler, MCMXCVIII. All rights reserved.

The author hereby grants to MIT permission to reproduce and

distribute publicly paper and electronic copies of this thesis

document in whole or in part, and to grant others the right to do so.

A uthor ...

...

Department of Electrical Engineering and Computer Science

May 22, 1998

C ertified by....

...

...

Kenneth Stevens

Clarence LeBel Professor

bhesis,Supervisor

AJcUL

4•'pted

JUL

141998

by ...-

f.6-"•

...

Arthur C. Smith

Chairman, Department Committee

o,

Graduate Students

U ,I ( "~: "~.

(2)

Analysis and Synthesis of the American English Lateral

Consonant

by

Adrienne Prahler

Submitted to the Department of Electrical Engineering and Computer Science on May 22, 1998, in partial fulfillment of the

requirements for the degree of

Master of Engineering in Electrical Engineering and Computer Science

Abstract

The lateral consonant in English is generally produced with a backed tongue body, a midline closure of the tongue blade at the alveolar ridge, and a path around one or both of the lateral edges of the tongue blade. In pre-vocalic lateral consonants, the release of the closure causes a discontinuity in the spectral characteristics of the sound. Past attempts to synthesize syllable-initial lateral consonants using formant changes alone have not been entirely satisfactory. Data from prior research has shown rapid changes not only in the formant frequencies but also in the glottal source amplitude and spectrum as well as in the amplitudes of the formant peaks at the consonant release. Further measurements have been made on additional utterances, guided by models of lateral production. Synthesis of lateral-vowel syllables that include additional changes in bandwidths, pole-zero pairs, spectral tilt, and the amplitude of voicing are judged to be more natural than lateral-vowel syllables with only formant transitions.

Thesis Supervisor: Kenneth Stevens Title: Clarence LeBel Professor

(3)

Acknowledgments

I would like to thank Ken Stevens and all of the Speech Group for everything. Ken has been an inspiration to me both professionally and personally, thank you. Thank you to my parents and brother for all their love and support over the years - I would definitely not be here without all of you. Thank you to Peter for all the laughs and nights out - you kept me sane. Basak, what can I say except we are DONE!

(4)

Contents

1 Introduction 12

1.1 M otivation.. . . . ... . . . ... . . . .. . . . 12

1.2 Properties . . . 13

1.3 Background ... ... .... . ... 13

1.3.1 Poles and Zeros of Vocal Tract Transfer Function . . . . 13

1.3.2 Glottal Source Reduction . . . . 16

1.3.3 Acoustic Losses ... 16

1.4 Purpose of Research ... 17

2 Modeling 20 2.1 M otivation . . . . . 20

2.1.1 Simple Lossless Tube . . . . 20

2.1.2 Expected Numbers of Poles and Zeros . . . . 21

2.2 Evaluation of Lateral Side Channels . . . . 22

2.2.1 Equations .... .. ... ... . . .. .. . .. .. .. .. 22

2.2.2 Boundary Conditions... 24

2.2.3 Solve for Specific Cases of Lengths and Areas . . . . 24

2.3 Evaluation of the Entire Model . . . . 29

2.3.1 Equations ... ... . . ... . . .. .. . . .. . . ... 29

2.3.2 Boundary Conditions... 30

2.3.3 Solution for specific cases of lengths and Areas . . . . 31

(5)

2.4.1 Calculation of the Number of Poles and Zeros and Approximate Locations . . . . 35 2.5 Limitations of Model... 35 2.5.1 Assumptions... 35 2.6 Conclusions . . .. ... . . .... ... .... . . . .. . . 36 3 Measurements 37 3.1 Purpose of Measurements - What Makes a /1/? . . . . 37

3.2 M ethod . . . 38

3.3 M easurements... 39

3.3.1 Singleton . . . 39

3.3.2 Cluster.. . . . 41

3.4 Singleton Analysis ... 41

3.5 Analysis of Utterances with Consonant Clusters . . . . 45

3.6 Comparisions of Environments . . . . 47 3.7 Conclusions .. .. . .. . ... ... .... . . . 50 3.7.1 Abruptness ... ... 50 4 Synthesis 51 4.1 Abruptness/Consonantal Quality . . . . 51 4.1.1 Changes in Frequency ... 51 4.1.2 Changes in Amplitude ... 52 4.2 M ethod . . . 52

5 Results of Perceptual Testing 62 5.1 First Perceptual Experiments . . . . 62

5.2 Final testing . . . . 63

6 Conclusions 67 6.1 Further Research ... 69

(6)

A.1 Solve Lateral Channel Equations . . . . 71 A.2 Solve Equations for Lateral Branches when One Tube Disappears . . 72

A.3 Solve for Entire Model ... 72

(7)

List of Figures

2-1 Model of side channels formed during lateral consonant production 23 2-2 Transfer function for model of side channels, 20logIU,.t/UinI, plotted

for A21 = A22 = .1 cm2 and 12 1 = 12 2 = 8 cm . . . . . . 25 2-3 Transfer function for model of side channels, 20loglUout/Uin , plotted

for A21 = .3 cm2, A2 2 = .2 cm2 and 121 = 8.5 cm, 122 = 7.5 cm . . . . . 26 2-4 Transfer function for model of side channels, 20log|Umt/Uinl, plotted

for A21= A22 = .3 cm2 and 12 1 = 4 cm, 12 2 = 12 cm . . . . . . 27 2-5 Transfer function for model of side channels, 20logjU0ut/U.n|, plotted

for A21 = .3 cm2,A

22 = .1 cm2 and 121 = 4 cm, 122 = 12 cm .

. . . .

. 27

2-6 Transfer function for model of side channels, 20loglUout/Ui•, plotted

for A2 1/A22 = .1 cm2 to .5 cm2 and 121 + 122 = 16 cm .

. . . .

. 28 2-7 Model of the vocal tract during lateral consonant production . . . . . 29 2-8 Transfer function for model of vocal tract, 201ogjU,0 t/UJn, plotted for

A21 = A22 = .2 cm 2 and 12 1 =1 22 = 8 cm . . . . . . 32

2-9 Transfer function for model of vocal tract, 201oglUout/UVn1, plotted for

A21 = A22 = .2 cm2, 21 = 10 cm, and 12 2 = 8 cm . . . . . . 33 2-10 Transfer function for model of vocal tract, 20logjUo0~t/Uinj, plotted for

A21 = .2 cm2, A22 = .5 cm2 and 121 = 11.5 cm and 122 = 4.5 cm . . .. 33 2-11 Transfer function for model of vocal tract, 20logjUout/Uinj, plotted for

various values of A2 1 and A22 and 121 + l 22 = 16 cm . . . . . . 34 3-1 The effect of the pre-emphasis on the spectra . . . . 40 3-2 Spectrogram of luck for male speaker . . . . 40

(8)

3-3 Spectrogram of voiced cluster utterance, bleed, for male speaker . . . 42 3-4 Spectrogram of voiceless cluster utterance, plead, for male speaker . . 42 3-5 Changes in amplitudes of formants from lateral to vowel for singleton

utterances: error bars represent standard deviation of data . . . . 43 3-6 Changes in formant frequencies from lateral to vowel for singleton

ut-terances: error bars represent standard deviation of data . . . . 44 3-7 Changes in amplitudes of formants from liquid to vowel for voiced and

some voiceless stop cluster utterances: error bars represent standard

deviation of data ... 45

3-8 Changes in formant frequencies from liquid to vowel for voiced and some voiceless stop cluster utterances: error bars represent standard

deviation of data ... ... 46

3-9 Comparison of A's of amplitudes of formants from lateral to vowel for singleton and cluster utterances: error bars represent standard devia-tion of data . . . 47 3-10 Comparison of A's of formant frequencies from lateral to vowel for

sin-gleton and cluster utterances: error bars represent standard deviation of data.. . . . 48 3-11 Measured values of formant frequencies during lateral consonant: error

bars represent standard deviation of data . . . . 49 4-1 Spectrogram of natural utterance loot for male speaker . . . . 53 4-2 Spectrogram of first synthesized utterance of loot for male speaker . . 53 4-3 Spectrogram of second synthesized utterance of loot for male speaker 53 4-4 Spectra of natural utterance during /1/ using a 6.4 ms Hamming window 54 4-5 Spectra of first synthesized utterance during /1/ using a 6.4 ms

Ham-ming window ... ... ... 54

4-6 Spectra of second synthesized utterance during /1/ using a 6.4 ms

Ham-ming window ... 55

(9)

4-8 Time varying voicing changes in synthesized utterance . . . . 58 4-9 Time varying additional pole and zero in synthesized utterance. . .. 58 4-10 Time varying bandwidth changes in synthesized utterance . . . . 59 4-11 Spectra of natural utterance during /1/ using 25.6 mns Hamming window 59 4-12 Spectra of first synthesized utterance during /1/ using 25.6 ms

Ham-m ing window ... ... 60

4-13 Spectra of second synthesized utterance during /1/ using 25.6 ms

Ham-m ing window ... ... 60

5-1 Results of perceptual experiments: % first utterance rated more natu-ral by listeners . . . 64 5-2 Total change in Fl and F2 between lateral and vowel vs. % times

second synthesized utterances rated more natural than first synthesized utterance . . . 65

(10)

List of Tables

3.1 Singleton utterances ... 38

3.2 Cluster utterances... 39

3.3 A's of formant frequencies and amplitudes for singleton utterances; Amplitude changes are in dB, frequencies are in Hz, and standard deviations (sd) across speakers and repetitions are given in the right

half of the table... ... 43

3.4 A's of formant frequencies and amplitudes for voiced and some voiceless stop cluster utterances; Amplitude changes are in dB, frequencies are in Hz, and standard deviations (sd) across speakers and repetitions are given in the right half of the table . . . . 46 3.5 Comparision of singleton and cluster utterances for same speakers;

Am-plitude changes are in dB, frequencies are in Hz, and standard devia-tions (sd) across speakers and repetidevia-tions are given in the lower half of the table.. . . . 48 3.6 Measured formant frequency values in Hz of the lateral for singleton

and cluster utterances . . . . 49 4.1 Constant synthesis parameters of utterance incorporating formant

fre-quency transitions... 56

4.2 Constant synthesis parameters of utterance incorporating voicing changes 59 5.1 Results of perceptual experiments: % first utterance rated more

(11)

5.2 Total change in frequency between lateral and vowel for synthesized utterances . . . 64

(12)

Chapter 1

Introduction

1.1

Motivation

Consonants in most languages are different from vowels in many ways including abruptness at the consonant release and a reduction in overall power during the sound[9]. The American English lateral consonant is very different from other sounds in that a similar vocal tract configuration produces an extreme variety of frequency spectra. The lateral consonant is sonorant and not continuant, so it is similar to nasal consonants, but differs from glides. The lateral must be examined to develop a better understanding of speech sounds in language.

The lateral is a liquid consonant which is more elusive than other consonants. Most sounds in languages have distinctive contrasts and minimal pairs, but the lat-eral lacks a clear contrasting sound, allowing more variations in the production of the sound. The lateral, however, can also be a syllable nucleus which lacks abruptness, and the perceptual cues must be contained in the spectrum only. Understanding the perceptual cues within the frequency spectra and in the transitions to vowels is a prac-tical motivation for studying the lateral. This information can be used for a variety of purposes including synthesis, speech recognition, and teaching second languages (to speakers of languages without a comparable lateral sound in their language, i.e., Japanese).

(13)

1.2

Properties

The American English lateral /1/ is one of the semivowel consonants; it is produced with a complete closure on the midline of the vocal tract like many other consonants, but the narrowing of the vocal tract differentiates it from vowels. The lateral is produced with a backed and lowered tongue body and occlusion at the alveolar ridge. A complete closure is not made with the tongue, and airflow continues around the tongue. The first formant(F1) is low, although higher than that typically found for a high vowel, and the second formant(F2) is barely separated from the first formant. The third formant(F3) generally has a relatively strong amplitude and is higher in frequency than the third formant frequency for most vowels[16, 4, 3]. Espy-Wilson reports the average formant frequencies of prevocalic /1/: Fl averaged across all speakers is 399 Hz, F2 is 1074 Hz, and F3 is 2553 Hz[6, 5]. The lateral is prone to considerable variation depending on the individual and the phonetic context, and this variability makes it more difficult to characterize than other consonants [5, 15]. The acoustic analysis of /1/ is not complete, and more work is needed to determine how to characterize the lateral acoustically for applications in speech synthesis directly and any number of other applications including speech recognition and speech pathology. This thesis will develop a theoretical model of the lateral, and will examine the validity of the model by acoustically analyzing the prevocalic lateral for various speakers and contexts, synthesizing the laterals using the theoretical model, and conducting perceptual experiments to determine the primary acoustic cues for a lateral.

1.3

Background

1.3.1

Poles and Zeros of Vocal Tract Transfer Function

To better understand the acoustic characteristics of the American English laterals, some work has focused on modeling the vocal tract for this class of sounds. The acoustic theory of the laterals was first described by Fant [7]. Assuming that the constriction sizes in the vocal tract are large enough (.17 cm2 or greater [15]) to avoid

(14)

the production of turbulence noise, the acoustic theory is simple for the laterals. The vocal tract is modeled as a tube with constrictions and side branches; the first formant is approximately a Helmholtz resonance, with the acoustic mass due to the lateral constriction. The low second formant resonance is because of the pharyngeal constric-tion caused by the tongue body backing in the cavity behind the constricconstric-tion. The third formant is roughly a resonance of the mouth cavity anterior to the constriction, and the fourth formant is determined from the length of the entire cavity system. The production of the lateral /1/ with an alveolar point Gf articulation creates an interior cavity formed by the tongue blade. An additional cavity is created under the tongue which couples with the back cavity. The result is that there are zeros as well as poles in the transfer function for a lateral configuration. The lowest zero is possibly caused by the cavity formed by the tongue blade while an additional pole is due to the entire system. Fant suggests that the effect of the pole-zero pair is simply that the fourth formant takes on the role of the third and so on, and a lateral can be synthesized without the additional zero if the formants are simply shifted appropriately[7] .

A modification of Fant's theory is proposed by Stevens. The side branches around the tongue will affect the high frequency behavior of the lateral. The transfer function of the vocal tract will have additional zeros when the branches in the airway are asymmetrical. The exact locations of the zeros will vary since the length of the side branches are variable among speakers. Stevens suggests great variability in the spectrum of the lateral in the frequency range of 2500-4000 Hz depending on the surrounding environment and the individual speaker. The rate of release of the lateral is slower than that of other alveolar consonants and the possible interaction of the additional poles and zeros on the surrounding speech segments is unknown[15]. Unlike Fant, Stevens suggests that the effect of the poles and zeros, regardless of their actual location, is not simply shifting of the formants up in frequency, but instead has an overall effect of modifying the spectrum in the high frequency range.

Previously, a limiting factor in creating an accurate model of the lateral was the lack of data on the length of the lateral channels. Recent work by Narayanan et al. [12, 13] at UCLA using Magnetic Resonance Imaging (MRI) and electropalatographic

(15)

techniques (EPG) have provided valuable data that can lead to a more accurate understanding of the geometry, acoustics, and aerodynamics of the lateral. Using the MRI and EPG studies, detailed 3-D data of the vocal tract during the production of the laterals were obtained, and the appearance of side channels during the lateral consonant was confirmed. However, MRI and EPG data show that the left and right channels are not equal in either area or length, and there is great variation across subjects and phonetic contexts. All sounds in the study, however, are produced with a lingual occlusion approximately 1 - 1.5 cm away from the lip opening and a closure length of 0.6-1.5 cm. The cross-sectional area of the channels formed varies from 0.1-0.5 cm2 and the relative areas of the right and left channels show variation even

for the same speaker.

The acoustic implications of these data are the ability to finally know the physical dimensions and configuration of the vocal tract and oral cavity during the production of the lateral, and this knowledge permits more accurate modeling of the acoustics. The lateral channel areas and low flow rates suggest no significant pressure drop in the supraglottal constriction region and negligible chances for frication. These observations support the assumption of Stevens [15] that turbulence noise can be ignored in modeling the vocal tract for the lateral, at least when the glottis is in the normal configuration for phonation. This finding also suggests sustained, uniform flow throughout the duration of the sound. The first formant frequency can be attributed to the Helmholtz resonance between the back cavity volume and oral constriction, with the low frequency behavior being dominated by the back cavity configuration. The change of Fl at the release is expected to be abrupt due to the abrupt changes in area functions arising from the anterior tongue blade movement. The second formant is associated with the back cavity resonance and can be greatly affected by retracting or raising the posterior tongue body. Three-dimensional modeling from the collected data suggests that the tongue blade has a tendency towards an inward lateral compression which creates the lateral channels observed [12], supporting the hypothesis of Sproat and Fujimura [14] that the tongue blade narrowing is a feature of laterals.

(16)

1.3.2

Glottal Source Reduction

In addition to the extra poles and zeros in the frequency spectrum during laterals, a study by Bickley and Stevens [1] found a change in the glottal waveform when a constriction is formed in the vocal tract during the semivowels. The glottal source is affected by the narrowing of the vocal tract during the production of the lateral /1/, which creates an increased acoustic mass and changes the volume velocity waveform at the glottis. As the acoustic impedance increases due to the decreasing constric-tion size, the pressure drop across the constricconstric-tion increases causing a decrease in the pressure across the glottis. The pressure drop across the constriction causes an increase in the intraoral pressure during the open phase of the glottal waveform. The decreased transglottal pressure affects the forces on the vocal folds and leads to a decreased amplitude of the volume velocity waveform. Although there are individual differences across speakers, the study suggested that there is always a decrease in intensity during the lateral relative to the adjacent vowel. There is also a less abrupt termination for the glottal pulse, leading to a greater spectrum tilt.

1.3.3

Acoustic Losses

Research shows that in addition to the presence of poles and zeros and glottal source reduction during the lateral, acoustic losses in the vocal tract increase. As the acoustic resistance at the constriction increases, there are significant effects on the bandwidth of the first formant. The bandwidth of the second formant also increases significantly during the lateral[16]. The study by Bickley and Stevens [1] found a change in ampli-tude of the first harmonic to be on average 2.9 dB for /1/, while the average change in amplitude of the first formant was 7.5 dB. This greater change in the first formant amplitude relative to the amplitude of the fundamental suggests an increase in the bandwidth of the first formant. The large bandwidth of F1 can be attributed to the acoustic losses in the oral constriction causing the overall reduction of the amplitude of the spectrum during the /1/ [2].

(17)

1.4

Purpose of Research

Previous research still leaves many unanswered questions concerning the acoustic properties of the American English lateral /1/. Research has suggested several models of the vocal tract for the production of prevocalic /1/. However, the variability among individuals and phonetic context makes it difficult to determine the main features of an accurate model without further research. Additional poles and zeros exist in the frequency spectrum during the production of the /1/, but the exact placement and perceptual importance is not known. Glottal source reduction and acoustic losses also occur, but the amount and best method to simulate them in synthesis is not known.

This research acoustically analyzes the lateral to determine the essential com-ponents and perceptual cues for the /1/ for theoretical modeling and synthesis. A database of utterances containing prevocalic /1/ is created for a variety of male and female speakers. These 1aterals are acoustically analyzed looking specifically for

pole-zero pairs caused by the unequal lateral channels which have been confirmed by the MRI and EPG studies [13], back reactions on the glottal source during the lateral due to the constriction of the vocal tract [2], and increasing bandwidths caused by suggested acoustic losses [15]. Using the previous research and the gathered acous-tic evidence, a theoreacous-tical model of the vocal tract for the laterals is developed. A selected set of database words are synthesized using the Klatt synthesizer, and percep-tual experiments in which the listeners rate the naturalness of the various synthesized sounds is used to determine the validity of the theoretical model and pinpoint the acoustic cues for the prevocalic /1/.

One of the primary acoustic features examined is the additional poles and zeros in the spectrum during the lateral. It is hypothesized that the observed poles and zeros found around 1.5-3 kHz are due to some combination of the lateral channels and back cavity, but the great variability of the right and left channel area functions for the same speaker and across speakers makes it difficult to determine the exact effects and their perceptual importance [12]. This research attempts to determine whether the exact placement of the pole and zero is important perceptually or if the

(18)

overall high frequency effects of a fast drop off of the spectral amplitude above the frequency of the second formant, combined with the higher third formant frequency, are the primary acoustic cues used by listeners for the prevocalic lateral.

Another acoustic feature to be examined in the database of sounds is the glottal source during the lateral. Previous research suggests that with the narrowing of the vocal tract during the lateral, there are some back effects that alter the actual glottal source. Preliminary research also shows some sort of glottal back effect for word-initial prevocalic laterals, but the reduction is not as apparent for prevocalic laterals found in consonant clusters. The variability of the glottal source reduction between speakers and phonetic contexts makes it difficult to determine what is perceptually important. For accurate, natural sounding synthesized laterals and a valid theoretical model of the lateral, the manifestation of glottal source reduction must be determined through acoustic analysis of various speakers and contexts.

Acoustic losses are also examined in prevocalic laterals. Preliminary research demonstrates increased acoustic losses during the lateral, but the exact magnitude is still not determined. The most reliable measure of acoustic loss is the increased bandwidth of the first formant. Acoustic analysis of the database is used to estimate the amount of acoustic loss for modeling and synthesis.

The acoustic analysis of the prevocalic laterals is used to determine the validity of a theoretical model of the vocal tract. This model is then used for synthesis purposes. Previous synthesis work has contrasted the two liquid consonants, /1/ and /r/ and used the differences between the two sounds to define the lateral for synthesis. The work incorporated a prominent F3 and a decreasing F3 value at the release of the lateral to discriminate the sound from /r/[11]. A change in the voicing is created by lowering the amplitude of voicing during the liquid, but the naturalness of these synthesized laterals is not very good. Synthesis performed in the present study attempts to determine the parameters inherently important for the natural synthesis of the lateral. The Klatt synthesizer parameters that are specifically examined include TL (tilt), BW (bandwidths of formants), the possibility of poles and zeros, and formant changes[8]. Initial synthesis of sample data suggests that the

(19)

TL parameter increases, the bandwidth of the first formant increases, a pole and zero are present near the third formant, and abrupt transitions of the formants occur at the release of the lateral.

The importance of the additional details in the model of the prevocalic lateral are determined through perceptual testing. A set of words in the database are synthe-sized for a speaker based on the theoretical model, and actual utterances are also used to determine the synthesis parameters necessary for a good match of the synthesized lateral to the natural lateral. Two versions of the word are synthesized. One version time varies only formant transitions while the other time varies bandwidths, spectral tilt, a pole-zero pair, in addition to the formant frequencies. Listeners rate the nat-uralness of the various sounds, spoken and synthesized, in perceptual experiments. These data are then analyzed to determine what parameters are primary and whether a more complicated model of the lateral is necessary for good synthesis.

(20)

Chapter 2

Modeling

2.1

Motivation

In this chapter, a model of the acoustic behavior of the vocal tract for lateral con-sonants is developed. The aim is to interpret the changes in the frequency spectra that occur during the production of the lateral consonant. We are interested in the natural resonance frequencies of the vocal tract configuration during the lateral pro-duction, and also the frequencies of possible zeros in the transfer function. Acoustic losses, including bandwidth changes and glottal source changes, are not considered in the model. The proposed model offers some explanation of the acoustic attributes of the lateral. It also attempts to explain how the same sound can be produced with such large variability in spectral peaks and valleys. This variability suggests that the acoustic cues are not limited to the locations of the resonances, but there may be more complicated perceptual effects.

2.1.1

Simple Lossless Tube

To understand and model the vocal tract during lateral consonant production, a simple lossless tube is first examined. A combination of such tubes is then used as the basis for the model of the lateral consonant. The two variables of interest in the tube are the sound pressure and the volume velocity. The sound pressure p(x, t) and

(21)

volume velocity U(x, t) for one dimensional propagation are

8p -- pOUUau (2.1)

Ox A&t

A

7Po& (2.2)

where A is the cross-sectional area, Po is the ambient air pressure, p is the ambient pressure of the air (0.00114 gm/cm3), and 7-y is the ratio of specific heats at a constant

volume (7 = 1.4 for air). Assuming an exponential time dependence and a constant area function, these equations reduce to

d p + k 2p- 0 (2.3) d2 d2+ k'p = 0 (2.3) and A dp= U (2.4) 327r

f

p dx

where k = 2, and c = 2[ PK(the velocity of sound, 35,400 cm/sec in the body)[15].

2.1.2

Expected Numbers of Poles and Zeros

Previous research verifies that adding channels to a model increases the number of poles and in some cases causes zeros to appear in the transfer function. In the simple case of a single tube, the expected number of poles and zeros in the transfer function for a uniform tube is determined by the length of the tube and the position and the type of source. The average spacing of poles for a uniform tube of length 1 is

(. By adding side branches of total length 1, the average spacing of poles for the

system decreases to . The number of poles up to a certain frequency,

f,

is np, = 2(1+,)f Zeros also appear in the transfer function and the number up to

frequency,

f,

is nz ,. = C The first minimum, zero, in the transfer function occurs at

fz

= for a uniform side branch that is closed at the end[15, 7].

The lateral however consists of two parallel paths not just an additional side branch, and this affects the expected approximate number of poles and zeros. Stevens

(22)

hypothesizes that poles of the transfer function are replaced by pole-zero-pole clusters and that the first appearance of such a cluster will occur at the half wavelength of the lateral channels,

f

= (2(Lc12)) [15]. A difficulty with two parallel paths is determining

how they will interact; one path could be considered as the main channel, while the other acts as a side branch.

2.2

Evaluation of Lateral Side Channels

Fant suggests the possibility of the formation of side channels during lateral consonant production and this is confirmed by the data gathered at UCLA[7, 13]. The addition of side channels during the lateral consonant changes the transfer function of the vocal tract configuration. Previous research shows that speakers produce the lateral consonant with a variety of vocal tract configurations depending on the individual and the phonetic context[12]. Speakers can have two side channels of different lengths and different areas or only one channel. The side channels formed during the lateral consonant can be modeled approximately as two uniform tubes, as shown in Figure 2-1. We consider the acoustic behavior of such a configuration.

2.2.1

Equations

Solving Equations 2.3 and 2.4 for the input sures, the equations reduce to

(

Urn

,

2

()cos(-k1

21

)

Pin2 - sin(-kl21)

and

(

(

cos(-kl

22)

Pin I - -2Pcsin(-k122) where 121 and 122 are the lengths of each of are the constant cross-sectional areas.

and output volume velocities and

pres-(2.5)

cosin(-k121

)

pout2

cos(-kl21)

out,,

4

=sin(-kl

22)

(

0oUot22

C(2.6)

cos(-k122)

/

Pout22

/

(23)

121 'I Uin Uout A22 122

(24)

2.2.2

Boundary Conditions

The boundary conditions of the system are restrictions on the volume velocities and sound pressures of the two tubes at the input and output. The input volume velocity, Uin, is the sum of Uin2, and Uin,22 and the output volume velocity, Uo, is the sum of the two output volume velocities. The volume velocities are assumed to sum without any interference at the two locations. The input and output pressures of the two tubes are also assumed to be equal. The transfer function, Uout/Uin, of the two side channels is obtained by solving Equations 2.5 and 2.6 for the boundary conditions of the system, assuming zero pressure at the output of the system, Pot = 0.

A21sin(kl22) + A22sin(kl21)(2.7)

"A 21cos(k121)sin(k122) + A22sin(k121)cos(k122

2.2.3

Solve for Specific Cases of Lengths and Areas

The transfer function of the system depends on the lengths and areas of the two channels. The model is a lossless model and is useful for locating the resonance peaks and valleys of the system, but not the amplitudes. In this simple system, a zero will occur when the numerator of the transfer function is zero and a pole will occur when the denominator is zero. A zero occurs at the frequencies for which

sin(k122) = -A22/A21sin(kl21) (2.8)

and a pole occurs when

cos(kl21

)sin(k1

22) = -A 22/A2

1sin(kl21)cos(kl

22) (2.9)

The transfer function has been calculated for several different cases of lengths and area configurations, reflecting the possible vocal tract configurations used by individuals.

(25)

0 1000 2000 3000 4000 5000 6000

Frequency (Hz)

Figure 2-2: Transfer function for model of side channels, 20log|IUout/Ui,|, plotted for

A

21 = A2 2 = .1 cm2 and 121 = 122 = 8

cm

Side Channels the Same Lengths The transfer function, 201og|UoJt/Uid , is plot-ted for A2 1 = A2 2 = .1 cm2 and 121 = 122 = 8 cm in Figure 2-2. When the two

cross-sectional areas, A21 and A2 2, and lengths, 121 and 122, are equal, the transfer

function has no interference from the side channels and the transfer function appears to be that of a simple, lossless tube of ltot = 121 = 122 = 8 cm.

Varying the cross-sectional areas of the tubes will not affect the locations of the resonance peaks or valleys when the lengths of the side channels are equal as Equa-tions 2.8 and 2.9 suggests. Some zero points are canceled by poles at the same location when the ratio of the lengths is an integer. In this case, where the lengths are equal, the zeros created are cancelled by the additional poles and the system appears as a single tube of length 8 cm. As expected, the number of poles of the system up to 6 kHz is 3, using the rough approximation of the number of poles described in Section 2.1.2 with a single tube of length I = 8 cm.

Side Channels of Different Lengths The transfer function, 201ogiUGot/Uin, is plotted for A21 = .3 cm2, A22 = .2 cm2 and 121 = 8.5 cm, 122 = 7.5 cm in Figure 2-3. The 1 cm difference between side channel lengths produces two pole-zero pairs - the

(26)

0 1000 2000 3000 4000 5000 6000 Frequency (Hz)

Figure 2-3: Transfer function for model of side channels, 20log UoUt/UiI, plotted for

A21 = .3 cm2, A22 = .2 cm2 and 121 = 8.5 cm, 122 = 7.5 cm

first at approximately 2 kHz and the second at 4 kHz. The first zero location seems to be at a frequency corresponding to a single wavelength of the total length of ýhe side channels (in this case 16 cm).

The transfer function, 20logIUo0 t/U,,j, is plotted for A21 = A22 = .3 cm2 and 121 = 4 cm, 122 = 12 cm in Figure 2-4. As the difference between the two side channels increases, the variation from a simple all-pole system is greater. Although the total number of poles present up to 6 kHz is not as expected, bunching may be occurring at higher frequencies and some cancellation of poles and zeros occurs. Since the ratio of the lengths is an integer and the cross-sectional areas are equal, some of the zeros are cancelled by additional poles. A pole-zero pair does appear with the zero at approximately 2100-2300 Hz. Varying the lengths of the side channels, greatly alters the shape of the transfer function.

In Figure 2-5, the transfer function, 201ogUo,,t/Ujl, is plotted for A21 = .3 cm2,

A2 2 =- .1 cm2 and 121 = 4 cm, 122 = 12 cm. The total length and length ratios are the same as in Figure 2-4, but the ratio of the cross-sectional areas is different. The change in area ratio produces a great change in the transfer function when the lengths of the side channels are also different, and an additional pole-zero pair appears in the

(27)

00

Frequency (Hz)

Figure 2-4: Transfer function for model of side channels, 201oglUot/Ui I, plotted for

A21 = A22 = .3 cm2 and 121 = 4 cm, 122 =: 12 cm

Frequency (Hz)

Figure 2-5: Transfer function for model of side channels, 20logljUt/UjnI, plotted for

A21 = .3 cm2,A 22 = .1 cm2 and 121 = 4 cm, 122 = 12 cm

(28)

1000-2000 Hz range. The second lower pole-zero pair could possibly appear due to the change in the ratio of the areas. Even with these four limited cases, the variability in the spectra with this vocal tract is extreme, from a simple all-pole system to a system with two pole-zero pairs.

0 2000 4000

Frequency (Hz) 6000

Figure 2-6: Transfer f'unction for model of side channels, 201og|Uo"tUi,|, plotted for A2 1/A22 = .1 cm2 to .5 cm2 and 121 + 122 = 16 cm

To further demonstrate the variability possible with slight modifications of the vocal tract configuration, Figure 2-6 shows the transfer function for pairings of cross-sectional area ratios and channel lengths, with the sum of the side channels held constant, 121 + 122 = 16 cm. These plots illustrate the extreme variability that can

occur. An interesting observation is that all configurations produce a pole-zero pair in the 1.5-3 kHz range, corresponding approximately to one wavelength of a tube of length 16 cm.

- A22/A21=.33, 122/121=.88 - - A22/A21=.2, 122/121=3

- - A22/A21=.5, 122/121=1.46

(29)

2.3

Evaluation of the Entire Model

Uout

Figure 2-7: Model of the vocal tract during lateral consonant production

During lateral production, the vocal tract can be modeled in its entirety as two side channels coupled with a simple uniform tube on each end, as shown in Figure 2-7.

2.3.1

Equations

Solving Equations 2.3 and 2.4 for the input and output volume velocities and pres-sures, the equations for the additional two sections reduce to

J

-i cos(-kli) sin(-kl)

(2.10

Pi c k - (2.10)

Pini 2-A sin (- kll) cos(-klj) Pout,

4

(30)

and

(Use

cos(-l)

sn(-kl3))

Uot3

/ " / 'PC

(2(1

Pin3 -2csin(-k13) cos(-kla) PoUt3

where 11 and 13 are the lengths of each of the systems, A1 and A3 are the constant cross-sectional areas of the front and back tubos.

2.3.2

Boundary Conditions

The transfer function, Uout/Uin, of the system is obtained by combining Equations 2.5, 2.6, 2.10 and 2.11. The boundary conditions for the system are determined for each of the sections. The input volume velocity, Uin,, is the source volume velocity for the system. The output volume velocity of the left tube, U,,t, is the sum of the two input volume velocities of the side channels, Uin, and Uin22. The output pressure,

Pot,, is the input pressure of the side channels. The input volume velocity of the

third section, Uin3, is the sum of the output volume velocities of the side channels,

Uout,, and Uout2,. The output pressure of the two side channels is the same as the input pressure to the right section, Pin3. The output pressure, PoUt3, is assumed to be zero because the end of the tube is open, and the output volume velocity, UOt 3, is the output volume velocity of the system, Uout. The transfer function is

Uot/Uin = [-A3(A21sin(kl22)+ A22sin(kl21))]/

(-2A2lA 22cos(kli)cos(k12 1)cos(kl22)sin(k13)

+A A21sin(kll )sin(kl22)cos(kl21) sin(k13)

+AA 3 sin(kli

)

sin(kl22) sin((k121 )cos(k13) -A 22A3cos(kll)cos(k122)sin(kl21)cos(kl3)

+(A21+

A22) (cos(klI)sin(kl21)sin(kl3)sin(kl22))

+2A 21 A22cos(kl )sin(kl3)

-A 21A3cos (kl)cos (kl2

1)cos(k1

3)sin(kl22) +AA 22sin(kl)cos(kl22)sin(k13) sin(k121)

(31)

2.3.3

Solution for specific cases of lengths and Areas

The transfer function, 201og|IoUt/Ui•n , is solved for various lengths and cross-sectional areas to determine the effects on the frequency spectra. The lengths and areas of the first and third section are not varied. In all of the following figures, A3 = 5 cm2,

13 = 10 cm, A, = 2 cm2, 11 = 1 cm are kept constant and correspond roughly to

measured values gathered by Narayanan et al. [13, 12]. The second formant in the model will not be as low as actually measured because all of the sections are assumed to have constant areas, and, according to perturbation theory squeezing the cross sectional area of section 1 will cause the second resonance peak to shift lower[15]. Also, squeezing section 3 near the middle (to simulate the backed tongue body) will lower the second resonance.

As in the situation with only the side channels, this system will have zeros when the numerator of the transfer function is zero and a pole will occur when the denominator is zero. The zeros of the system are determined by the same equation as for the simple model. Zeros will occur when,

sin(kl22) = -A 22/A21sin(kl21). (2.13)

The locations of the poles in this more complicated model are different from the pole locations for the simple model since the denominator is more complicated. In some cases when poles cancel zeros in the simple model, cancellation may not occur in the more complicated model. For a general idea of the number and location of the additional zeros, the zeros for the simple model can be solved for using Equation 2.8 of the simple model.

Same Lengths The transfer function, 201ogjUot/Uinj, is plotted in Figure 2-8 for

A21 = A22 = .2 cm2 and 121 = 122 = 8 cm. The system appears as an all-pole system

as expected. When the side channels are the same length, the effect on the transfer function is to appear as a single tube of the length of the side channels (not the sum). If the model accounted for losses, the area effects would be visible but this model

(32)

I

Frequency (Hz)

Figure 2-8: Transfer function for model of vocal tract, 201ogJUot/Ui I, plotted for

A21 = A2 2 = .2 cm2 and 121 = 122 = 8 cm

only locates peaks and valleys of : : system. As expected, the number of poles of the system up to 6 kHz is 6, using the approximation described in Section 2.1.2.

Same Areas, Different Lengths The transfer function, 201og|Uout/Uin, is plotted in Figure 2-9 for A21 = A22 = .2 cm2, 121 = 10 cm, and 122 = 8 cm. The different

lengths of the side channels produce two pole-zero pairs with one in the 1800-2200 Hz range. The number of poles is related to the total length of the system, and using the approximation described in Section 2.1.2 the number of poles is expected to be 9 up to 6 kHz. Assuming the additional length to the system is 8 cm (the average length of the two side branches), the expected number of zeros with the addition of the side branches is between two and three. Obviously, the number of zeros with two side channels, is not just dependent on the total length of the side branches but on the interaction of the side channels described by Equation 2.8 a rough approximation of the number of zeros can be determined by the additional length of the side channels to the system.

Different Areas, Different Lengths The transfer function, 201ogJUot/Unin, is plotted in Figure 2-10 for A21= .2 cm2, A22 = .5 cm2 and 121 = 11.5 cm and 122 = 4.5

(33)

I--)0

Frequency (Hz)

Figure 2-9: Transfer function for model of vocal tract, 201ogUo,,t/UinJ, plotted for

A21 = A22= .2 cm2, 121 = 10 cm, and 122 = 8 cm

00

Frequency (Hz)

Figure 2-10: Transfer function for model of vocal tract, 20loglUot/U,,I, plotted for

(34)

cm. The difference in areas and lengths of the two side channels causes the transfer function to include three pole-zero pairs.

~0 0 5B; 0M 0 2000 4000 Frequency (Hz) 6000

Figure 2-11: Transfer function for model of vocal tract, 20logIUot/Ui.,j, plotted for various values of A21 and A22 and 121 + 22 = 16 cm

The transfer function, 201ogU1,/t/Uinj, is plotted in Figure 2-11 for various values of A21 and A22 and 121 + 122 = 16 cm. Even though the total length of the side channels, 121 + 122, remains constant at 16 cm, the variation in the frequency spectra of the transfer function is huge. The length of the side channels is not necessarily 16 cm for all individuals, but the variation does give a general idea of what occurs with different side channel lengths and areas. Pole-zero pairs appear at approximately 1500-2000 Hz and continue to the high frequencies.

- A22/A21=2.5, 122/121=3

SA22/A21=3, 122/121=.88

- - A22/A21=2, 122/121=.52

(35)

2.4

Justification of the Model

The model suggested here does not account for any acoustic losses including changes in bandwidth or glottal source. The model does give a general idea of the location of resonance peaks due to the addition of the side channels.

2.4.1

Calculation of the Number of Poles and Zeros and

Approximate Locations

The complete model will give rise to zeros at the locations specified by Equation 2.8. Since the total number of poles and zeros can not increase for the system, the transfer function, as a general approximation, will also show the formation of the same number of poles as zeros. The number of zeros expected up to a certain frequency, f, is

nz = (2) as described in Section 2.1.2. The exact locations are determined by the

C

denominator of the transfer function. When the lengths are integer multiples of each other, pole and zero cancellations can be expected. The simpler model of the side channels in Figure 2-1 can be used to determine the possible locations of the zeros and the expected number of zeros.

2.5

Limitations of Model

2.5.1

Assumptions

Lengths The total sum of the side branches is held constant at 16 cm which is greater than data gather by Narayanan et al. suggests [12, 13]. However, the exam-ination of natural utterances, suggests the formation of a pole-zero pair around the third formant. Shortening the side channel lengths, increases the frequency of the first zero above the third formant.

Constant Areas The assumption of constant areas for the model is not realistic, but it simplifies the calculations and allows some generalizations to be made of the expected effects.

(36)

Formant Locations The second formant calculated by this model does not account for the backed tongue body position and the F2 is not as low as observed. Perturbation theory of the back cavity accounts for the lower F2 observed and indicates why the simple model does not exhibit this behavior.

2.6

Conclusions

Although the total lengths of the side channels are kept constant, the locations of the resonance peaks and valleys are not constant. Small changes in the lengths and cross-sectional areas causes huge variations in the spectra. If the key perceptual cue for the lateral consonant is the location of the peaks, then speakers would have to maintain a certain configuration of the vocal tract that is not very stable. Since similar configurations create extremely different effects, the key perceptual cue can not be the exact location of the resonance peaks and valleys of the spectrum. Instead, the cues must be more subtle, and possibly the addition of the poles and zeros assist in creating the cues. The constant attributes across the various lengths and cross-sectional areas is low first and second formant resonances and the addition of multiple pole-zero pairs beginning at approximately 1500 Hz.

(37)

Chapter 3

Measurements

3.1

Purpose of Measurements

-

What Makes a

/1/?

The lateral consonant is extremely variable depending on the phonetic context and speakers. Modeling of the vocal tract as a configuration with side channels suggests that the resonance frequencies of the system are susceptible to subtle changes in cross-sectional areas and lengths of the side channels. If the key perceptual cue can not be the exact location of the resonance peaks and valleys, then what is the cue used by listeners to identify the lateral? Previous research suggests that other changes in the system occur during the lateral production including changes in the glottal source [1, 2], bandwidth changes, and the addition of pole zero pairs. Measurements are made on lateral consonants produced by various speakers in order to better understand the changes occurring during the lateral production and to determine what might be potential perceptual cues used to discriminate the lateral consonant from other sounds.

(38)

3.2

Method

All recordings were made for normal English speakers with normal hearing. The speakers were recorded in the sound room at the MIT RLE Speech Group lab. The recordings were made onto audio tape or onto DAT tapes. Utterances for male speak-ers were digitized at a sampling rate of 10 kHz and low pass filtered at 4.8 kHz while female speakers were digitized at a sampling rate of 13 kHz and low pass filtered at 6.2 kHz. One set of utterances for a male speaker and a female speaker were directly digitized from DAT tapes and downsampled to 10 kHz and 13 kHz respectively using the Sound Design program.

Singleton The singleton lateral utterances were recorded for six speakers, three fe-male and three fe-male. Three repetitions of six pre-vocalic /1/ utterances were recorded in isolation with an extra word at the end of the list to prevent intonation variations. The speakers were instructed to say the word in a normal tone and to maintain a constant intonation for all words. The word list is given in Table 3.1.

leap loot

let lap

law luck

Table 3.1: Singleton utterances

The lateral consonant is released into vowels at the four extremes of the possible high and low tongue body configurations and two lax vowels at intermediate tongue-body heights. All speakers of the singleton utterances were also recorded for the cluster utterances.

Cluster The cluster utterances were recorded for eight speakers, four female and four male. Two repetitions of the cluster words spoken in the phrase, "Say word again" were recorded. The speakers were instructed to say the list at a comfortable level and pace. The word list is given in Table 3.2. The stop consonants were combined with /r/ and /1/ in combinations that occur in English and released into the vowels, /i/

(39)

and /a/. Additionally, singleton stop consonants and /r/ were used for utterances releasing into the two vowels.

beat bought bleed block

breed broad keep cop

clean clod crete craw

deep dot dream drop

geese got glean glop

green grog peat pot

plead plod preach prod

team top treat trod

reed rod

Table 3.2: Cluster utterances

3.3

Measurements

Windows, Placements All measurements of the waveforms were made using a 6.4 ms Hamming window. The release of the liquid consonants was determined by exam-ining the speech waveform and the spectra for the change in formants (particularly the second formant resonance peak) and voicing.

Pre-Emphasis Pre-emphasis on the spectra was used to ensure a more accurate first frequency formant peak, since the first formant frequency is very low for the lateral consonant. The first harmonic interferes with the measurements and Fl some-times appears to be lower than 300 Hz even with pre-emphasis.

Figure 3-1 plots the attenuation in dB of the pre-emphasis filter for a 13 kHz sampling rate. The effect can be roughly thought of as a 6 dB/octave slope up to about 3 kHz. The pre-emphasis effect is removed from measurements of the spectrum amplitudes.

3.3.1

Singleton

Measurements of singleton utterances were made at two points in time using a method similar to that described by Stevens and Blumstein[16]. Measurements of the first

(40)

00

Figure 3-1: The effect of the pre-emphasis on the spectra

three formants and the amplitudes were taken 20 ms prior to the release of the lateral and 20 ms after the release into the vowel. A series of spectra obtained with the 6.4 ms Hamming window were averaged over a 12 ms interval, and therefore included at least one full glottal period. Use of this averaging technique was convenient since it did not require careful placement of the window at the beginning of each glottal period. An example spectrogram of a singleton utterance is shown in Figure 3-2.

LSPECTO: 256-pt DFT, smart AGC 6.4-ms Hamming window every 1 ms

[APRAHLER.SYN] APR 20 1998 PJLUCK1

z 0 w a: Ll~ -! F TIME (ms) R9n10T.. . . (s 40L -.... t---t <0~ 0 100 200 300 400 500 600 700 800 900 o1000 TIME (ms)

Figure 3-2: Spectrogram of luck for male speaker I I

(41)

3.3.2

Cluster

Several measurements on utterances with clusters were made at different points in the waveform, although not all of them are reported here. A 6.4 ms Hamming window was centered on the initial burst produced by the stop consonant, and the frequencies and amplitudes of the low frequency peak and high frequency peak (greater than 2.5 kHz for men and 3 kHz for women) were measured. If the stop consonant was voiced, additional measurements of the frequencies and amplitudes of the low and high frequency peaks were made 20 ms after the burst and 20 ms prior to voicing onset with the Hamming window, and averaging for 12 ms. Measurements of the first three formant peaks and amplitudes were taken 20 ms after the release of the liquid consonant or 20 ms after voicing onset (20 ms into the vowel if there was no liquid consonant in the utterance) for voiced and voiceless consonants using the 6.4 ms Hamming window and averaging for 12 ms. If the liquid consonant was sustained for more than 20 ms by the speaker, an additional measurement was made 20 ms prior to the release of the liquid using the 6.4 ms Hamming window averaging for 12 ms.

The spectrogram of a voiced labial cluster utterance, bleed, for a male speaker is shown in Figure 3-3 and the spectrogram of a voiceless labial cluster utterance, plead, is shown in Figure 3-4. The lateral is not sustained in the voiceless consonant cluster: it appears that the lateral is being released as voicing onset occurs.

3.4

Singleton Analysis

Figures 3-5 and 3-6 show the changes in frequencies and amplitudes of the first three formants from the time 20 ms before the release to 20 ms after the release of the consonant. Clearly, the vowel has higher frequencies and amplitudes than the lateral consonant. The changes in frequencies depend on the vowel that follows. Although the change in Fl is small for both leap and loot which are high vowels, these two words have the greatest average change in amplitude, Al. This change in amplitude is much greater than what would be expected based on a small change in Fl. With greater

(42)

LSPECTO: 256-pt DFT, smart AGC 6.4-ms Hamming window every I me

. 80 . .

(L 40

0 100 200 300 400 500 600 700 800 900 1000

TIME (ms)

Figure 3-3: Spectrogram of voiced cluster utterance, bleed, for male speaker

LSPECTO: 256-pt DFT, smart AGC 6.4-ms Hamming window every 1 ms

80

a.40

A-<0

0 100 200 300 400 500 600 700 800 900 1000

TIME (ms)

(43)

leap let lap law luck loot

Figure 3-5: Changes in amplitudes of formants from lateral to vowel for singleton utterances: error bars represent standard deviation of data

word A1 Al AA2 A A3 AF1i AF2 sdal sda2 sda3 sdfl sdf2

leap 9.4 9.0 19.5 15.9 875.9 2.0 3.2 2.0 14.4 108 let 6.4 13.8 17.0 237.1 461.2 3.8 3.8 7 13.8 105 lap 3.9 15.7 20.3 213.9 321.1 2.0 3.0 2.2 29.4 47.8 law 3.7 10.3 8.2 318.7 358.9 4.2 4.4 3.8 49.6 51 luck 4.8 13.5 14.2 248.8 119.6 3.6 4.2 3.2 40.6 50.8 loot 12.7 11.9 15.4 36.1 272.8 3.2 5.0 4.6 30.2 60.6

Table 3.3: A's of formant frequencies and amplitudes for singleton utterances; Am-plitude changes are in dB, frequencies are in Hz, and standard deviations (sd) across speakers and repetitions are given in the right half of the table

(44)

800

600

400

200

0

leap let lap law luck loot

Figure 3-6: Changes in formant frequencies from lateral to vowel for singleton utter-ances: error bars represent standard deviation of data

(45)

changes in amplitudes occurring than what is expected based on the shifts in formant frequencies, other changes in the source must occur. Possibly the formant bandwidths and the glottal source change, cause a more abrupt transition and enhance the effects resulting from changes in frequency. This enhancement provides strong perceptual cues that distinguish the lateral consonant from, for example, the glide /w/.

3.5

Analysis of Utterances with Consonant

Clus-ters

O

delta Al

Sdelta A2

E delta A3

la ra i

Figure 3-7: Changes in amplitudes of formants from liquid to vowel for voiced and some voiceless stop cluster utterances: error bars represent standard deviation of data Figure 3-7 and 3-8 show the changes in frequencies and amplitudes of the first three formants from the point 20 ms before the release to 20 ms after the release of the consonant. The data includes voiced stop consonants and voiceless stop consonants with a duration of at least 20 ms of the liquid. The lateral consonant has greater

(46)

1000

~ii~~i

la ra d

Figure 3-8: Changes in formant frequencies from liquid to vowel for voiced and some voiceless stop cluster utterances: error bars represent standard deviation of data

word A1l ALA2 A A3 ALF1 AF2 sdal sda2 sda3 sdfl sdf2

la 0.8 7.5 8.8 242.4 149.7 3.4 9.4 8.2 167 255

li 3.9 2.6 12.0 -0.4 684.7 3.8 7.0 11.6 37.0 558

ra 0.6 3.9 4.8 114.4 41.8 5.8 6.8 14.8 182 148

ri 0.0 1.5 7.2 -3.3 490.4 5.2 9.8 14.0 53.2 274

Table 3.4: A's of formant frequencies and amplitudes for voiced and some voiceless stop cluster utterances; Amplitude changes are in dB, frequencies are in Hz, and standard deviations (sd) across speakers and repetitions are given in the right half of the table

(47)

changes in both amplitudes and frequencies than the retroflex consonant for both vowels, especially changes in the amplitude of the third formant. The change in F1 for /li/ is greater than for /la/ but the change in F2 is is greater for /li/ than for

/la/.

3.6

Comparisions of Environments

25 20 15 10 5 0

delta Al delta A2 delta A3

Figure 3-9: Comparison of A's of amplitudes of formants from lateral to vowel for singleton and cluster utterances: error bars represent standard deviation of data Figures 3-9 and 3-10 compare the data for the laterals as singletons and in clusters. The figures show the changes in frequencies and amplitudes of the first three formants from 20 ms into the vowel to 20 ms prior to the release of the lateral consonant. The data used for the comparisons of cluster and singleton lateral consonant changes are from the same six speakers, three female and three male. The changes in ampli-tudes and frequencies for the singleton lateral consonant are greater than changes in amplitudes and frequencies for the cluster lateral consonants.

(48)

1000 delta F1 Scluster /a/ E single/a/ I cluster lI D' single/N delta F2

Figure 3-10: Comparison of A's of formant frequencies from lateral to vowel for singleton and cluster utterances: error bars represent standard deviation of data

measurement cluster /a/ single /a/ cluster /i/ single /i/

A Al 0.8 3.7 3.9 9.4 A A2 7.5 10.3 2.6 9.0 A A3 8.8 8.2 12.0 19.5 A F1 242 319 -0.4 15.9 A F2 150 359 685 876 sd al 1.6 2.0 1.8 4.2 sd a2 4.8 3.0 3.4 4.4 sd a3 4.0 2.2 5.8 3.8 sd fl 88.2 29.2 18.6 49.4 sd f2 128 47.8 229 51

Table 3.5: Comparision of singleton and cluster utterances for same speakers; Am-plitude changes are in dB, frequencies are in Hz, and standard deviations (sd) across speakers and repetitions are given in the lower half of the table

Figure

Figure  2-2:  Transfer  function  for  model  of  side  channels,  20log|IUout/Ui,|, plotted  for
Figure  2-3:  Transfer  function  for  model  of  side  channels,  20log  UoUt/UiI, plotted  for A 21  =  .3  cm 2 ,  A 22  = .2 cm 2  and  121  =  8.5  cm,  1 22  =  7.5  cm
Figure  2-4:  Transfer  function  for model  of side  channels,  201oglUot/Ui  I,  plotted  for A 21 =  A 22 =  .3 cm 2 and  121  =  4 cm,  1 22 =:  12  cm
Figure  2-6:  Transfer  f'unction  for  model  of side  channels,  201og|Uo&#34;tUi,|, plotted  for A 2 1 /A22  =  .1 cm 2  to  .5  cm 2  and  121  +  122  =  16  cm
+7

Références

Documents relatifs

Ein verwandt- schaftliches Empfinden ergab sich vor allem aus dem gemeinsamen Status einer Republik, aber auch aufgrund der ähnliehen politischen Institutionen und

C’était le moment choisi par l’aïeul, […] pour réaliser le vœu si longtemps caressé d’ accroître son troupeau que les sècheresses, les épizoodies et la rouerie de

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des

Abstract– We explore the particular variations of semantic felicity in co-predicative utter- ances the constraints on the combination of facets of polysemous words, and the

Before realizing an automatic system of emotion detection based on those principles, it was necessary to know if the task had a chance of success, There- fore we first tested how

Before realizing an automatic system of emotion detection based on those principles, it was necessary to know if the task had a chance of success, There- fore we first tested how

According to what has been observed in spontaneous dialogue [5] and intelligent rea- ding [ 3 ][ 4 ][ 8 ] , speakers base prosodic expression, and especially pitch range, on the

Indeed, instead of conceiving sentences or utterances as the sole “maximal-units” of syntactic description, our framework suggests that two different kinds of