• Aucun résultat trouvé

F rom Zen to Aum

N/A
N/A
Protected

Academic year: 2022

Partager "F rom Zen to Aum"

Copied!
13
0
0

Texte intégral

(1)

Computational Linguistics

F rom Zen to Aum

G ´erard Huet

Chalmers Univ ersit y , Ma y 21st, 2004

-1-

(2)

The Zen to olkit - Generic tec hnology

Afewspecificapplicativetechniques:

•Localprocessingoffocuseddata

•Sharing

•Lexicaltrees

•Differentialwords

•Finitetransducersaslexiconmorphisms

•Searchbyresumptioncoroutines

•Multisetorderingconvergence

-2-

(3)

Automata Mista - AuM

Werepresentfinite-stateautomatabyamixedstructure-adeterministicskeletondecoratedbynon-deterministictransitions.

Thefirstcomponentisaforestoflexicaltrees,usedascoveringtreesofthestatetransitionsgraph.Therestofthetransitionsisrepresentedasannotationsstatingthatonacertaininput(awordpossiblyempty,allowing-transitions),theautomatongoestoastatedesignatedbyavirtualaddress.Therearetwokindsofaddresses,localandglobal.Aglobaladdressisgivenbyaninteger(indexingintotheforestarray)andaword.Alocaladdresshasthesamestructure,butnowactsasadifferentialword.Itsfirstcomponentindexesintoanarrayrepresentingtheaccesspathinthecurrenttree(necessarybecauseofsharing).

-3-

(4)

Differen tial w ords

typedelta=(int*word);

Adifferentialwordisanotationpermittingtoretrieveawordwfromanotherwordw 0sharingacommonprefix.Itdenotestheminimalpathconnectingthewordsinatree,asasequenceofupsanddowns:ifd=(n,u)wegoupntimesandthendownalongwordu.

Wecomputethedifferencebetweenwandw 0asadifferentialword diffww 0=(|w1|,w2)wherew=p.w1andw 0=p.w2,withmaximalcommonprefixp.

Theconverseofdiff:word->word->deltais

patch:delta->word->word:w 0mayberetrievedfromwand d=diffww 0asw 0=patchdw.

-4-

(5)

The automaton structure

typeinput=word;

typedelta=(int*word)andaddress=[Globalofdelta|Localofdelta];

typeauto=[Stateof(bool*deter*choices)]anddeter=list(letter*auto)andchoices=list(input*address);

typeautomaton=(arrayauto*delta);

typebacktrack=(input*delta*choices)andresumption=listbacktrack;(*coroutineresumptions*)

-5-

(6)

Completeness

Everynon-deterministicautomaton(possiblywithtrasitions)mayberepresentedasaflataum(withemptydeterministicstructure).

EverydeterministicautomatonmayberepresentedasanaumwhosechoiceannotationsState(b,[],[([],address)])donotgiverisetobacktrack.

Everyaumhasaminimalrepresentation,obtainedbymaximalsharing.N.B.Sharingthelocalvirtualadressesdoesnotnecessarilycorrespondbyequivalencebybisimulation.

-6-

(7)

The transducer structure

typeinput=wordandoutput=word;

typedelta=(int*word)andaddress=[Globalofdelta|Localofdelta];

typetrans=[Stateof(bool*deter*choices)]anddeter=list(letter*trans)andchoices=list(input*output*address);

typetransducer=(arraytrans*delta);

typebacktrack=(input*output*delta*choices)andresumption=listbacktrack;(*coroutineresumptions*)

-7-

(8)

-8-

(9)

foret

pile a1

an a1

an k 1k AuM

dag courant mot dag

-9-

(10)

Memorisation of the curren t access

Theaccessstack[sn;sn1;...s0]isnecessary,tointerpretlocalvirtualaddresses.Itmaybeconvenienttostoreaswellthecurrentaccess wordword=[an;...a1],stackedandunstackedalongthelocalaccesses.Wemaythusdistinguishtwooutputconstructors:

AbsoluteofwordetRelativeofdelta.Inthelastcase,outputiscomputedbypatchappliedtoword.

Applications:

•Inflectedformsdictionaryusedaslemmatizer(regularplural:(δ=(1,[ 0s 0]))

•Unglue(δ=(0,[]))

•Segment(δ=(0,u))

-10-

(11)

Mo dular aums

Anaumisgivenbyapairin(arrayauto*delta).

Wemakethemmodularbymakingtheglobaladdressesrelocatable,andpossiblyinterpretingsuccessstatesbycontinuations.Continuationsareimplementedas-transitions,i.e.extrachoices,withemptyinput.

Nowitiseasytocompileregularexpressionsintoaums,asfollows:

•Thebasecaseisanyaum,itssizethesizeofitsarray

•ifA=(arrayA,deltaA)isofsizeaandB=(arrayB,deltaB)isofsizeb,A·BisobtainedbyrelocatingBbya,continuingAbya+deltaB,startingatdeltaA,ofsizea+b.

•ifA=(arrayA,deltaA)isofsizeaandB=(arrayB,deltaB)isofsizeb,A+BisobtainedbyrelocatingBbya,startingata+b+1,whereweput

-11-

(12)

State(False,[],[([],deltaA);([],a+deltaB)]),ofsizea+b+1.

•ifA=(arrayA,deltaA)isofsizea,thenA∗isobtainedbycontinuingAbydeltaA,makingitsstartingnodeaccepting,ofsizea.

Thesetransformationsoughttobeeffectedbeforesharing.

-12-

(13)

Conclusion

Automatamistaofferanelegantapplicativesolutiontomanyfinite-stateprocessingproblems,typicallythetreatmentoflexiconrepresentation,phonology,morphologyandsegmentationincomputationallinguistics.Thedeterministicspanningtreeoftheirstatespaceisthennaturallythedictionaryofinflectedformsofwords,whichisthusplacedatthecenterofthecomputertreatmentoflanguage.

-13-

Références

Documents relatifs

[r]

[r]

2 Les deux courbes ci-dessous donnent la concentration dans le sang (en mg·L −1 ) en fonction du temps (en min) pour deux formes différentes d'un anti-douleur (dont

[r]

on Abiel" lIordm(llmiallo subsp. Lecu/lOra ugurdhiall" Ach. Thc top of TUrbckaya hill. Lecallora cI,larotera Nyl. The west ofHarmankaya hi l!. on Quercus permea

value rec react input output back occ = fun[Trie(b,forest)->ifbthenletpushout=[occ::output] inifinput=[]then(pushout,back)(* solution

L’étude comparative de l’évacuation gastrique (volumes et matière sèche) de deux aliments, incorporant respectivement une fécule de pomme de terre crue ou la même

L’air en mouvement tourbillonnant est aspiré vers le centre du cyclone. Comme la force d’aspiration est dirigée vers le centre, elle n’a pas de moment de force par rapport