• Aucun résultat trouvé

F rom Zen to Aum

N/A
N/A
Protected

Academic year: 2022

Partager "F rom Zen to Aum"

Copied!
13
0
0

Texte intégral

(1)

Computational Linguistics

F rom Zen to Aum

G ´erard Huet

Chalmers Univ ersit y , Ma y 21st, 2004

-1-

(2)

The Zen to olkit - Generic tec hnology

Afewspecificapplicativetechniques:

•Localprocessingoffocuseddata

•Sharing

•Lexicaltrees

•Differentialwords

•Finitetransducersaslexiconmorphisms

•Searchbyresumptioncoroutines

•Multisetorderingconvergence

-2-

(3)

Automata Mista - AuM

Werepresentfinite-stateautomatabyamixedstructure-adeterministicskeletondecoratedbynon-deterministictransitions.

Thefirstcomponentisaforestoflexicaltrees,usedascoveringtreesofthestatetransitionsgraph.Therestofthetransitionsisrepresentedasannotationsstatingthatonacertaininput(awordpossiblyempty,allowing-transitions),theautomatongoestoastatedesignatedbyavirtualaddress.Therearetwokindsofaddresses,localandglobal.Aglobaladdressisgivenbyaninteger(indexingintotheforestarray)andaword.Alocaladdresshasthesamestructure,butnowactsasadifferentialword.Itsfirstcomponentindexesintoanarrayrepresentingtheaccesspathinthecurrenttree(necessarybecauseofsharing).

-3-

(4)

Differen tial w ords

typedelta=(int*word);

Adifferentialwordisanotationpermittingtoretrieveawordwfromanotherwordw 0sharingacommonprefix.Itdenotestheminimalpathconnectingthewordsinatree,asasequenceofupsanddowns:ifd=(n,u)wegoupntimesandthendownalongwordu.

Wecomputethedifferencebetweenwandw 0asadifferentialword diffww 0=(|w1|,w2)wherew=p.w1andw 0=p.w2,withmaximalcommonprefixp.

Theconverseofdiff:word->word->deltais

patch:delta->word->word:w 0mayberetrievedfromwand d=diffww 0asw 0=patchdw.

-4-

(5)

The automaton structure

typeinput=word;

typedelta=(int*word)andaddress=[Globalofdelta|Localofdelta];

typeauto=[Stateof(bool*deter*choices)]anddeter=list(letter*auto)andchoices=list(input*address);

typeautomaton=(arrayauto*delta);

typebacktrack=(input*delta*choices)andresumption=listbacktrack;(*coroutineresumptions*)

-5-

(6)

Completeness

Everynon-deterministicautomaton(possiblywithtrasitions)mayberepresentedasaflataum(withemptydeterministicstructure).

EverydeterministicautomatonmayberepresentedasanaumwhosechoiceannotationsState(b,[],[([],address)])donotgiverisetobacktrack.

Everyaumhasaminimalrepresentation,obtainedbymaximalsharing.N.B.Sharingthelocalvirtualadressesdoesnotnecessarilycorrespondbyequivalencebybisimulation.

-6-

(7)

The transducer structure

typeinput=wordandoutput=word;

typedelta=(int*word)andaddress=[Globalofdelta|Localofdelta];

typetrans=[Stateof(bool*deter*choices)]anddeter=list(letter*trans)andchoices=list(input*output*address);

typetransducer=(arraytrans*delta);

typebacktrack=(input*output*delta*choices)andresumption=listbacktrack;(*coroutineresumptions*)

-7-

(8)

-8-

(9)

foret

pile a1

an a1

an k 1k AuM

dag courant mot dag

-9-

(10)

Memorisation of the curren t access

Theaccessstack[sn;sn1;...s0]isnecessary,tointerpretlocalvirtualaddresses.Itmaybeconvenienttostoreaswellthecurrentaccess wordword=[an;...a1],stackedandunstackedalongthelocalaccesses.Wemaythusdistinguishtwooutputconstructors:

AbsoluteofwordetRelativeofdelta.Inthelastcase,outputiscomputedbypatchappliedtoword.

Applications:

•Inflectedformsdictionaryusedaslemmatizer(regularplural:(δ=(1,[ 0s 0]))

•Unglue(δ=(0,[]))

•Segment(δ=(0,u))

-10-

(11)

Mo dular aums

Anaumisgivenbyapairin(arrayauto*delta).

Wemakethemmodularbymakingtheglobaladdressesrelocatable,andpossiblyinterpretingsuccessstatesbycontinuations.Continuationsareimplementedas-transitions,i.e.extrachoices,withemptyinput.

Nowitiseasytocompileregularexpressionsintoaums,asfollows:

•Thebasecaseisanyaum,itssizethesizeofitsarray

•ifA=(arrayA,deltaA)isofsizeaandB=(arrayB,deltaB)isofsizeb,A·BisobtainedbyrelocatingBbya,continuingAbya+deltaB,startingatdeltaA,ofsizea+b.

•ifA=(arrayA,deltaA)isofsizeaandB=(arrayB,deltaB)isofsizeb,A+BisobtainedbyrelocatingBbya,startingata+b+1,whereweput

-11-

(12)

State(False,[],[([],deltaA);([],a+deltaB)]),ofsizea+b+1.

•ifA=(arrayA,deltaA)isofsizea,thenA∗isobtainedbycontinuingAbydeltaA,makingitsstartingnodeaccepting,ofsizea.

Thesetransformationsoughttobeeffectedbeforesharing.

-12-

(13)

Conclusion

Automatamistaofferanelegantapplicativesolutiontomanyfinite-stateprocessingproblems,typicallythetreatmentoflexiconrepresentation,phonology,morphologyandsegmentationincomputationallinguistics.Thedeterministicspanningtreeoftheirstatespaceisthennaturallythedictionaryofinflectedformsofwords,whichisthusplacedatthecenterofthecomputertreatmentoflanguage.

-13-

Références

Documents relatifs

value rec react input output back occ = fun[Trie(b,forest)->ifbthenletpushout=[occ::output] inifinput=[]then(pushout,back)(* solution

L’étude comparative de l’évacuation gastrique (volumes et matière sèche) de deux aliments, incorporant respectivement une fécule de pomme de terre crue ou la même

[r]

L’air en mouvement tourbillonnant est aspiré vers le centre du cyclone. Comme la force d’aspiration est dirigée vers le centre, elle n’a pas de moment de force par rapport

[r]

on Abiel" lIordm(llmiallo subsp. Lecu/lOra ugurdhiall" Ach. Thc top of TUrbckaya hill. Lecallora cI,larotera Nyl. The west ofHarmankaya hi l!. on Quercus permea

2 Les deux courbes ci-dessous donnent la concentration dans le sang (en mg·L −1 ) en fonction du temps (en min) pour deux formes différentes d'un anti-douleur (dont

[r]