• Aucun résultat trouvé

Sp eec h sounds

N/A
N/A
Protected

Academic year: 2022

Partager "Sp eec h sounds"

Copied!
3
0
0

Texte intégral

(1)

Speech recognition (briefl y)

Chapter15,Section6

Chapter15,Section61

Outline

♦Speechasprobabilisticinference

♦Speechsounds

♦Wordpronunciation

♦Wordsequences

Chapter15,Section62

Sp eec h as probabilistic inference

It’snoteasytowreckanicebeach

Speechsignalsarenoisy,variable,ambiguous

Whatisthemostlikelywordsequence,giventhespeechsignal?I.e.,chooseWordstomaximizeP(Words|signal)

UseBayes’rule:

P(Words|signal)=αP(signal|Words)P(Words)

I.e.,decomposesintoacousticmodel+languagemodel

Wordsarethehiddenstatesequence,signalistheobservationsequence

Chapter15,Section63

Phones

Allhumanspeechiscomposedfrom40-50phones,determinedbytheconfigurationofarticulators(lips,teeth,tongue,vocalcords,airflow)

Formanintermediatelevelofhiddenstatesbetweenwordsandsignal⇒acousticmodel=pronunciationmodel+phonemodel

ARPAbetdesignedforAmericanEnglish

[iy]beat[b]bet[p]pet[ih]bit[ch]Chet[r]rat[ey]bet[d]debt[s]set[ao]bought[hh]hat[th]thick[ow]boat[hv]high[dh]that[er]Bert[l]let[w]wet[ix]roses[ng]sing[en]button...

E.g.,“ceiling”is[siylihng]/[siylixng]/[siylen]

Chapter15,Section64

Sp eec h sounds

Rawsignalisthemicrophonedisplacementasafunctionoftime;processedintooverlapping30msframes,eachdescribedbyfeatures

Analog acoustic signal:

Sampled, quantized digital signal:

Frames with features: 10 15 38

52 47 82 22 63 24

89 94 11 10 12 73

Framefeaturesaretypicallyformants—peaksinthepowerspectrum

Chapter15,Section65

Phone mo dels

FramefeaturesinP(features|phone)summarizedby–anintegerin[0...255](usingvectorquantization);or–theparametersofamixtureofGaussians

Three-statephones:eachphonehasthreephases(Onset,Mid,End)E.g.,[t]hassilentOnset,explosiveMid,hissingEnd⇒P(features|phone,phase)

Triphonecontext:eachphonebecomesn2distinctphones,dependingonthephonestoitsleftandrightE.g.,[t]in“star”iswritten[t(s,aa)](differentfrom“tar”!)

Triphonesusefulforhandlingcoarticulationeffects:thearticulatorshaveinertiaandcannotswitchinstantaneouslybetweenpositionsE.g.,[t]in“eighth”hastongueagainstfrontteeth

Chapter15,Section66

(2)

Phone mo del example

Phone HMM for [m]:

0.1 0.90.3

0.6 0.4

C1: 0.5

C2: 0.2

C3: 0.3 C3: 0.2C4: 0.7

C5: 0.1 C4: 0.1C6: 0.5

C7: 0.4 Output probabilities for the phone HMM:

Onset:Mid:End: FINAL 0.7MidEndOnset

Chapter15,Section67

W ord pron unciation mo dels

Eachwordisdescribedasadistributionoverphonesequences

DistributionrepresentedasanHMMtransitionmodel

0.5

0.5 0.20.8 [m] [ey]

[ow][t]

[aa] [t]

[ah] [ow]

1.0 1.0

1.0 1.01.0

P([towmeytow]|“tomato”)=P([towmaatow]|“tomato”)=0.1P([tahmeytow]|“tomato”)=P([tahmaatow]|“tomato”)=0.4

Structureiscreatedmanually,transitionprobabilitieslearnedfromdata

Chapter15,Section68

Isolated w ords

Phonemodels+wordmodelsfixlikelihoodP(e1:t|word)forisolatedword P(word|e1:t)=αP(e1:t|word)P(word)

PriorprobabilityP(word)obtainedsimplybycountingwordfrequencies

P(e1:t|word)canbecomputedrecursively:define

`1:t=P(Xt,e1:t)

andusetherecursiveupdate

`1:t+1=Forward(`1:t,et+1) andthenP(e1:t|word)=

Σ

xt`1:t(xt)

Isolated-worddictationsystemswithtrainingreach95–99%accuracy

Chapter15,Section69

Con tin uous sp eec h

Notjustasequenceofisolated-wordrecognitionproblems!–Adjacentwordshighlycorrelated–Sequenceofmostlikelywords6=mostlikelysequenceofwords–Segmentation:therearefewgapsinspeech–Cross-wordcoarticulation—e.g.,“nextthing”

Continuousspeechsystemsmanage60–80%accuracyonagoodday

Chapter15,Section610

Language mo del

Priorprobabilityofawordsequenceisgivenbychainrule:

P(w1···wn)= nY

i=1 P(wi|w1···wi1)

Bigrammodel:

P(wi|w1···wi1)≈P(wi|wi1)

Trainbycountingallwordpairsinalargetextcorpus

Moresophisticatedmodels(trigrams,grammars,etc.)helpalittlebit

Chapter15,Section611

Com bined HMM

Statesofthecombinedlanguage+word+phonemodelarelabelledbythewordwe’rein+thephoneinthatword+thephonestateinthatphone

Viterbialgorithmfindsthemostlikelyphonestatesequence

Doessegmentationbyconsideringallpossiblewordsequencesandboundaries

Doesn’talwaysgivethemostlikelywordsequencebecauseeachwordsequenceisthesumovermanystatesequences

JelinekinventedAin1969awaytofindmostlikelywordsequencewhere“stepcost”is−logP(wi|wi1)

Chapter15,Section612

(3)

DBNs for sp eec h recognition

articulatorstongue, lips P(OBS | 2) = 1end-of-word observation

deterministic, fixed

stochastic, learned

deterministic, fixed phonemeindextransition

phoneme 010

o P(OBS | not 2) = 0

11122

nnn 0

o

observationstochastic, learned aabbuurraustochastic, learned

Alsoeasytoaddvariablesfor,e.g.,gender,accent,speed.ZweigandRussell(1998)showupto40%errorreductionoverHMMs

Chapter15,Section613

Summary

Sincethemid-1970s,speechrecognitionhasbeenformulatedasprobabilisticinference

Evidence=speechsignal,hiddenvariables=wordandphonesequences

“Context”effects(coarticulationetc.)arehandledbyaugmentingstate

Variabilityinhumanspeech(speed,timbre,etc.,etc.)andbackgroundnoisemakecontinuousspeechrecognitioninrealsettingsanopenproblem

Chapter15,Section614

Références

Documents relatifs

The dispersion of the linguistic carriers of concepts in concrete languages is undirected. We cannot predict how a concept in form of the linguistic applications develops or

The Greek language is rich in terms of words, which describe powerful relations and rulers of the world: Associated compound words in the Greek language with meanings, which

The scientists working out the cognitive-discursive paradigm use the methods and the data of the sociolinguistic, psycholinguistic, philosophical studies in the

«Эти факты, товарищи судьи, разумеется, сейчас принадлежат уже в значительной мере истории, но они проливают полный свет на вопрос,

Несимметричная энантиосемия с разной сочетаемостью значений (на примере глагола abhängen) выявила в процессе анализа (при наличии прочих

Trendelburg 2 analizează aserţiunea lui Aristotel cu referire la cele zece categorii ale realităţii şi ale individului uman, subliniind faptul că

Анализ текста показал, что выбор речевых сигналов скрытого воздействия адресанта на его адресата может быть обнаружен, описан, подсчитан и представлен в

– Distribution of the species of the genus Heligmosomoides, belon- ging to the “polygyrus line” modified from Asa- kawa (1988) from China and