D ip h th on gi za ti on as a c ue fo r th e au to m at ic id en ti fi ca ti on o f B ri ti sh E ng li sh d ia le ct s E m m a n u el F er ra g n e, F ra n ço is P el le g ri n o L ab o ra to ir e D y n a m iq u e D u L an g a g e, U M R C N R S 5 5 9 6 E m m an u el .F er ra g n e@ u n iv -l y o n 2 .f r F ra n ç o is .P el le g ri n o @ u n iv -l y o n 2 .f r In tr od uc ti on T h e vo w el s in t h e w or d s d ee d s an d fo od h av e b e en ph on ol og ic al ly an al yz e d as m on op h th on gs . H ow ev e r, f or m an t st ab il it y in t h es e vo w el s va ri es ac ro ss d ia le ct s in t h e B ri ti sh I sl e s. S ta nd ar d S ou th er n B ri ti sh E ng li sh (s se ) is kn ow n to ex h ib it ra th e r d ip h th on gi ze d re al iz at io ns of d ee d s an d fo od w h er ea s S co tt is h H ig h la nd s E ng li sh (s h l) h as tr ue m on op h th on gs (s e e F ig . 1 an d F ig . 2 ) G oa l: C an th e d e gr e e of m on op h th on g d ip h - th on gi za ti on al on e co ns ti tu e a re li ab le cu e to d ia le ct cl as si fi ca ti on ? C om pa ri so n of cl as si fi ca ti on b as e d on f or m an t tr aj ec to ri e s al on e w it h cl as si fi ca ti on p e rf or m e d b y a tr ai ne d ph on et ic ia n C a ut io n : D ur at io n is th e m os t ob vi ou s sp ur io us fa ct or h er e: deeds 75100125150 duration (ms)
shl
sse
lec dia t
]
] 150175200225 duration (ms)
shl
sse
dia t lec
]
] t-test: p < 0.01t-test: p < 0.01
ROC curvescaption: % correct identification andFisher's exact test: : H 0: noassociation betweenactualdialectanddialect- membershippredictedby lineardiscriminant analysis;plus formant withhighestcorrelation withdiscriminant function
PSOLAPSOLA Trainedphonetician ROC curvescaption: Fisher's exact test: H 0: no association betweenphonetician'sresponseand stimulus
Figure2: Broad-bandspectrogram(300 Hz filters; 4000 Hz diplayed) ofthevowel indeedsspokenby amalespeaker of Standard SouthernBritish English
F2
F3 F1 Figure1: Broad-bandspectrogram(300 Hz filters; 4000 Hz diplayed) ofthevowel indeedsspokenby a male speaker from theScottish Highlands
F3 F2 F1 food
A co us ti c m e as ur e m e nt s
(Centralfrequencyfor thefirstthreeformants in BarkmeasuredwiththePraat program)T h re e m e as ur e s of d ip h th on gi z at io n
1.dif: differencebetweenvalue at80% ofvoweldurationandvalue at20% ofduration 2.sd: standard deviationof9 values extractedfromeachformant: every10% ofthe duration 3.∆: 9 values extractedfromeachformant (sameas above) F k: frequencyofa givenformant (F1, F2 or F3) atpoint k(n=9);d: durationbetweenmeasurements: 1/10 of total voweldurationC la ss if ic at io n: l in e ar d is cr im in an t an al ys is
1 - Specificity1,0,8,5,30,0
Se nsi tiv ity
1,0 ,8 ,5 ,3 0,0 1 - Specificity
1,0,8,5,30,0
Se nsi tiv ity
1,0 ,8 ,5 ,3 0,0 1 - Specificity
1,0,8,5,30,0
Se nsi tiv ity
1,0 ,8 ,5 ,3 0,0 1 - Specificity
1,0,8,5,30,0
Se nsi tiv ity
1,0 ,8 ,5 ,3 0,0
78.1% -p < 0.001 –F193.8% -p < 0.001 71.9% -p = 0.037 –F162.5% -p = 0.137 –F2
☺ ☺
1 - Specificity1,0,8,5,30,0
Se nsi tiv ity
1,0 ,8 ,5 ,3 0,0 1 - Specificity
1,0,8,5,30,0
Se nsi tiv ity
1,0 ,8 ,5 ,3 0,0 1 - Specificity
1,0,8,5,30,0
Se nsi tiv ity
1,0 ,8 ,5 ,3 0,0 1 - Specificity
1,0,8,5,30,0
Se nsi tiv ity
1,0 ,8 ,5 ,3 0,0
93.8% -p < 0.001 –F281.3% -p < 0.001 –F190.6% -p < 0.001 –F2
☺ ☺ ☺
71.9% -p = 0.037difdeeds(F1,F2,F3)sddeeds(F1,F2,F3)∆deeds(F1,F2,F3)durationdeeds diffood(F1,F2,F3)sdfood(F1,F2,F3)∆food(F1,F2,F3)durationfood 1 - Specificity
1,0,8,5,30,0
Se nsi tiv ity
1,0 ,8 ,5 ,3 0,0
p = 0.144
foodnat 1 - Specificity
1,0,8,5,30,0
Se nsi tiv ity
1,0 ,8 ,5 ,3 0,0
p = 0.074
foodnd 1 - Specificity
1,0,8,5,30,0
Se nsi tiv ity
1,0 ,8 ,5 ,3 0,0
p = 0.299
foodsyn
1 - Specificity
1,0,8,5,30,0
Se nsi tiv ity
1,0 ,8 ,5 ,3 0,0
p < 0.001
☺
deedsnat p = 0.016 1 - Specificity
1,0,8,5,30,0
Se nsi tiv ity
1,0 ,8 ,5 ,3 0,0
deedsnd p = 0.216 1 - Specificity
1,0,8,5,30,0
Se nsi tiv ity
1,0 ,8 ,5 ,3 0,0
deedssyn
C on cl us io n
In theperceptualexperiment, thephoneticianmanagedto correctlyclassify abovechance levelonlythedeedsvowel, specificallyin thenatcondition. Although diphthongizationmust have helped, itseemsthatdurationwasthemain cueused in thistask. Thepoorresultsfor foodsuggesta ceilingeffectperhapsdue to lack ofwithin-dialecthomogeneity. Thebestclassification score withlineardiscriminant analysisfor deedsusing formants isachievedwithdif; andF1 trajectoryisthemostrelevant dimension. However, durationconstitutesthemostreliablecue. As for food, sdisthebest metric, withF2 showingthehighestcorrelationwiththediscriminant function. Durationhereonlyplaysa marginal role.1 1
11 . 1
n kk kFF nd+ =∆=− −
∑
Pe rc e pt ua l e x pe ri m e nt
Identification task: an expert native Englishphoneticianwasaskedto listento thevocalicportion ofdeedsandfoodandtell whetherthestimulus wasutteredby a speaker oftheScottish Highlandsor a speaker ofsomeotherBritish Englishdialect (sse) knownto diphthongizethesevowelsto a certain extent.T h re e ty pe s of st im ul i:
naturalvowels(nat):thevowelsofdeedsandfoodweresegmented(boundaryplaced attheinceptionoftheformant structure andattheendofit), extractedandnormalized for amplitude 15 tokensfor eachvowel neutralizedduration(nd):meandurationfor thevowelindeedsandfor thevowelinfood wascomputedoverthevoweltokensfromthetwodialects. meandeeds: 180 ms meanfood: 106 ms thedurationofeachdeedsvowelwasmodified(usingPSOLA) to equal180 ms, andthe durationofeachfoodvowel: 106 ms.amplitudenormalized15 tokensfor eachvowelresynthesizedvowels(syn): vowelresampled: 8000 Hz lengthened(PSOLA): deeds: 275 ms; food: 175 ms. LPC: 25 ms window, 5 ms steps, 8 coefficients, pre-emphasis above50 Hz flat F0: men: 120 Hz; women: 200 Hz amplitude normalized15 tokensfor eachvowel
Frequency (Hz)040000
20LPC FLATTENED F0 : men: 120 Hz women: 200 Hz
FLATTENED F0 : men: 120 Hz women: 200 Hz