• Aucun résultat trouvé

Combining multiple resources to build reliable wordnets

N/A
N/A
Protected

Academic year: 2021

Partager "Combining multiple resources to build reliable wordnets"

Copied!
9
0
0

Texte intégral

(1)

HAL Id: inria-00614706

https://hal.inria.fr/inria-00614706

Submitted on 15 Aug 2011

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Combining multiple resources to build reliable wordnets

Darja Fišer„ Benoît Sagot

To cite this version:

Darja Fišer„ Benoît Sagot. Combining multiple resources to build reliable wordnets. TSD 2008 - Text

Speech and Dialogue, 2008, Brno, Czech Republic. �inria-00614706�

(2)

wordnets

DarjaFi²er

1

,BenoîtSagot

2

1.Fa .ofArts,Univ.ofLjubljana,A²ker£eva2,1000Ljubljana,Slovenia 2.Alpage,INRIA/Paris7,30rueduCh.desrentiers,75013Paris,Fran e

darja.fiserguest.arnes.si,benoit.sagotinria.fr

Abstra t. This paper ompares automati ally generated sets of syn-onyms in Fren h and Slovene wordnets with respe t to the resour es usedinthe onstru tionpro ess.Polysemouswordsweredisambiguated viaave-languageword-alignmentoftheSEERA.NETparallel orpus,a sub orpusoftheJRCA quis.Theextra tedmultilinguallexi onwas dis-ambiguatedwiththeexistingwordnetsfortheselanguages.Ontheother hand,a bilingual approa h su ed to a quire equivalents for monose-mouswords.Bilingual lexi onswereextra tedfromdierent resour es, in ludingWikipedia,WiktionaryandEUROVOCthesaurus.A represen-tative sampleofthe generated synsetswasevaluated against the gold-standards.

1 Introdu tion

The rst wordnet was developed for English at Prin eton University (PWN). Overtime ithasbe omeoneofthemostvaluableresour esin appli ationsfor naturallanguageunderstandingandinterpretation,whi hinitiatedthe develop-mentofwordnetsformanyotherlanguagesapartfromEnglish[1,2℄.Currently, wordnets for morethan 50 languagesare registeredwith theGlobal WordNet Asso iation (http://www.globalwordnet.org/).Whileit is truethat manual onstru tionofea hwordnetisthemostreliableandprodu esthebestresults as far aslinguisti soundness and a ura yis on erned, su h an endeavouris highly time- onsuming and expensive. This is why alternative, semi- or fully automati approa heshavebeenproposed.Bytakingadvantageoftheexisting resour es,theyfa ilitatefasterandeasierdevelopmentof awordnet [3,4℄.

Apartfromtheknowledgea quisitionbottlene k,anothermajorproblemin thewordnet ommunityistheavailabilityofthedevelopedwordnets.Currently, onlyahandfulofthemarefreelyavailable(Arabi ,Hebrew,IrishandPrin eton). For example,a wordnet for Fren h hasbeen reatedwithin the EuroWordNet (EWN)proje t[1℄,theresour ehasnotbeenwidelyusedmainlyduetoli ensing issues.Inaddition,therehasbeennofollow-upproje ttofurtherextendand im-provethe oreFren hWordNetsin etheEWNproje thasended[5℄.Thisissue wastakenintoa ountinthetwore entwordnetdevelopmentproje tspresented in this paper,the resultsofwhi h willbeautomati ally onstru ted(but later alsomanually he ked)broad- overageopen-sour ewordnetsforFren h(WOLF,

(3)

in thenext se tion. Se tion 3 presentsthe twowordnet development proje ts. Se tion4presentsandevaluatesthe reatedresour eswithafo usona sour e-by-sour eevaluation, andthelast se tiongives on lusionsandperspe tives.

2 Related work

The relevant literature reports on several te hniques used to build semanti lexi ons, most of whi h an be divided into two approa hes. Contrary to the merge approa h, a ording to whi h a wordnet for a ertain languageis rst reatedbasedonmonolingualresour esandthenmappedtootherwordnets,we haveoptedfortheexpandapproa h[1℄.This modeltakesaxedset ofsynsets fromPrin eton WordNet(PWN) andtranslatestheminto thetarget language, preservingthestru tureoftheoriginalwordnet.The ostoftheexpandmodelis thattheresultingwordnetsarebiasedbythePWN.However,duetoitsgreater simpli ity,theexpandmodelhasbeenadoptedinanumberofproje ts,su has theBalkaNet[2℄andMultiWordNet [6℄,aswellasEWN[1℄.

Resear hteamsadoptingthelatterapproa htookadvantageofawiderange of resour esat their disposal, in ludingma hine readable bilingualand mono-lingual di tionaries,taxonomies, ontologies andothers. Forthe onstru tionof WOLFandSloWNet,wehaveleveragedthreedierentpubli lyavailabletypes ofresour es:theJRC-A quisparallel orpus,Wikipedia(andotherrelatedwiki resour es) and other types of bilingualresour es.Equivalents formonosemous literals that donotrequiresensedisambiguationwereextra ted frombilingual resour es.Roughly82%ofliteralsfoundinPWNaremonosemous,howevermost of themarenotin the orevo abulary.Ontheother hand,theparallel orpus wasusedto obtainsemanti allyrelevantinformationfrom translationssoasto be ableto handle polysemous literals. The idea that semanti insights an be derivedfromthetranslationrelationhasbeenexplored by[79℄.Theapproa h hasalsoyieldedpromisingresultsinanearliersmaller-s aleexperimenttoobtain synsetsforSlovenewordnet [10℄.

3 Approa h

Thisse tionbrieypresentstheapproa husedto onstru tawordnet automat-i ally.Foramoredetailed des riptionoftheapproa h,see[11℄.

Inthealignapproa hweusedtheSEE-ERA.NET orpus(proje tICT10503 RP),a1.5-million-wordsenten e-alignedsub orpusofJRC-A quis[12℄in eight languages.Apartfrom Fren h andSlovene,weusedEnglish,Romanian,Cze h and Bulgarian. We used dierent tools to POS-tag and lemmatize the orpus beforeword-aligningitwithUplug[13℄.Thisallowedustobuildvemultilingual lexi onsthatin ludeFren handfourmultilinguallexi onsthatin ludeSlovene. They ontain between 49,356 (Fr-Ro-Cz-Bg-En) to 59,020 entries (Fr-Cz-Bg-En). The next step was to assign the appropriate synset id to ea h entry of

(4)

orresponding BalkaNet wordnet [2℄. Sin e all these wordnets share the same synsetids asPWN2.0, theinterse tionofallthefoundsynset idsis omputed. The interse tion of all possiblesenses in ea h languageis likely to output the orre t one,whi h anbe assignedto theFren horSloveneliteral. Applied to the above-mentioned multilingual lexi ons, this te hnique allowed us to build several sets of (Fren h orSlovene) synsets (see Table2for quantitativedata). Be ause tagging,lemmatization andalignmentarenotperfe t, synsets reated in thiswaydoinheritsomeoftheseerrors.However,thevalueofthisapproa h lies in the fa t that they over polysemous literals from the ore vo abulary, whi hthetranslationapproa h annothandle(seeSe tion4).

Forthetranslationapproa h,appliedonmonosemousliteralsfromthePWN 2.0,weused thefollowingbilingualresour es:

 Wikipedia(http://www.wikipedia.org),amultilingual ollaborative en y- lopaedia.Weextra tedbilingualFr-EnandSl-Enlexi onsthanksto inter-wikilinksthatrelatearti lesonthesametopi indierentlanguages.

1

 TheFren h, Sloveneand English Wiktionaries (http://www.wiktionary. org),lexi al ompanionstoWikipedia,whi h ontaintranslationsintoother languages.

 TheWikispe ies (http://spe ies.wikimedia.org),ataxonomy of living spe ieswithtranslationsofLatinstandardnamesintoverna ularlanguages.  Eurovo des riptors (http://europa.eu/eurovo ) is a multilingual

the-saurususedfor lassi ationof EUdo uments.

 ForSlovene,weused alarge- overageele troni bilingual(English-Slovene) di tionary(over130,000entries).

 Finally,we reatedtrivialtranslationsbyretainingallnumeri literals(su h as1,000 or3.14159...)and allLatin taxonomi terms(extra tedfrom the TreeOfLifeproje twww.tolweb.org).

Be ause they are reated by translation of monosemous literals, these synsets willontheonehandbeveryreliable(seeTable3),butatthesametimemostly on ernnon- orevo abulary(seeTable1).

Synsetsobtainedfrombothapproa hesweremerged,whilepreserving infor-mationon thesour eof ea hpie e of information. Thisenabled us toperform asimpleheuristi lteringa ordingtothereliabilityofea hsour e,onthe di-versityofsour esthatassignagivenliteraltoagivensynset,andonfrequen y information(forthesour esfromthealignapproa h).

1

Theselexi onshave307,256entriesforFren hand27,667forSlovene.Thedieren e insize is substantial and will also lead to verydierent numberof the generated

(5)

4.1 Global evaluation

We omparedthemergedSloveneandFren h wordnets toPWN,Fren hEWN and amanually reatedsampleofSloveneWordNet, alled ManSloWNet

2 . Al-thoughweareawareofthefa t that theseresour esare notperfe t, theywere onsidered asgoldstandardfor ourevaluationpro edurebe ausetheywereby farthebest resour esofsu h kindwe ouldobtain.

WOLF urrently ontains32,351synsetsthatin lude38,001uniqueliterals. Thisgureismu hgreaterthanthenumberofsynsetsinFren hEWN(22,121 synsets ouldbemappedintoPWN2.0synsets).Thisisdire tly relatedtothe high numberof monosemousPWN literalsin non- oresynsets (119,528out of 145,627)thatthetranslationapproa hwasabletohandleadequately.Moreover, Fren h EWN has only nominal and verbal synsets, whereas WOLF in ludes adje tivalandadverbialsynsetsaswell.FiguresforSloWNetaresimilar:29,108 synsets that in lude 45,694literals (to be ompared with the4,868 synsets of ManSloWNet).However,withouttheEn-Sldi tionarythatwasusedforSlovene, thegureswouldhavebeenmu hlower.

In order to evaluate the overage of the generated wordnets, we used the BalkaNetBasi Con eptSets [2℄.Basi synsetsaregroupedintothreeBCS at-egories,BCS1beingthemostfundamentalsetofsenses.Theresultsforthe auto-mati ally onstru tedwordnetsare omparedtothegoldstandards(seeTable1). TheyshowthatbothWOLFand SloWNethaveareasonable overageofBCS senses. They also show that our approa h still does not ome lose to PWN, whi h was the ba kbone of our experiment. However, the generated wordnets are onsiderablyri her thantheonly otherwordnetsthat exist forFren h and Slovene,espe iallyfornon-BCSsynsets.Moreover,althoughthesameapproa h wasused,anddespitetheuseofabilingualdi tionary,SloWNetissmallerthan WOLF. This is mainly be ause Fren h Wikipedia is onsiderably larger than theSloveneoneandthusyieldsmanymoremonosemoussynsets,whi harenot alwaysfoundintheEn-Slbilingualdi tionary.

Thealignapproa hyielded arelativelylownumberofsynsets ompared to bilingualresour es,mostlybe auseitreliesonaninterse tionoperationamong several languages:ifsomesynsets were missing inanyof theexistingwordnets used for omparison, there wasnomat h among thelanguagesand thesynset ouldnotbegenerated.Interestingaswellisthenatureofsynsetsthatwere gen-eratedfromthedierentsour es.Basi ally,thealignapproa hthat handledall kindsofwordsresultedpredominantlyin oresynsetsfrom theBCS ategories. On the other hand, thebilingual resour esthat ta kled only themonosemous expressionsprovideduswithmu hmorespe i synsetsoutsidethe ore vo ab-ulary.Thealignapproa hworkedonlyonsinglewords,whi hiswhyallMWEs in theresultingwordnets ome frombilingualresour es.

2

ThissubsetoftheSloveneWordNet ontainsallsynsetsfromBCS1and2(approx. 5,000),whi hwereautomati allytranslatedfromSerbian,its losestrelativeinthe

(6)

wordnet PWN2.0 align transl merged(WOLF) Fren hEWN BCS1 1,218 79164.9% 17514.4% 870 71.4% 1,211 99.4% BCS2 3,471 1,30937.7% 52315.1% 1,668 48.0% 3,022 87.1% BCS3 3,827 82421.5% 1,10028.7% 1,801 47.1% 2,304 60.2% non-BCS 106,908 2,844 2.7%25,56623.9% 28,012 26.2% 15,584 14.6% total 115,424 5,768 5.0%27,36423.7%32,351 28.0% 22,121 19.2%

Automati ally generatedSlovenesynsets(SloWNet)

wordnet PWN2.0 align transl merged(SloWNet) ManSloWNet BCS1 1,218 61850.7% 18114.9% 714 58.6% 1,218 100% BCS2 3,471 89625.8% 60617.4% 1,361 39.2% 3,469 99.9% BCS3 3,827 57715.1% 1,12829.5% 1,611 42.1% 180 4.7% non-BCS 106,908 1,603 1.5%24,11622.6% 25,422 23.8% 1 0.0% total 115,424 3,694 3.2%26,03122.6%29,108 25.2% 4,868 4.2% Table 1.WOLFandSloWNetsynsets.Per entagesaregiven omparedtoPWN2.0.

4.2 Sour e-by-sour e evaluation

Fromaqualitativepointofview,wewereinterestedinhowreliablethevarious sour es,su hasthedierentlanguage ombinationsin thealignapproa h,wiki sour es and other thesauri, were for the reation of synsets. This is why the restof theevaluationisperformedonea hindividual sour eandthereliability s oresobtainedwill beusedtogenerate onden e measuresfortherestofthe generatedsynsetsfromthatsour ewhi h wewereunabletoevaluate automati- ally(be ausetheyaremissinginthegoldstandards).Werestri tedthismanual evaluation to nominal synsets. Verbalsynsets are moredi ult to handle au-tomati allyformany reasons:higherpolysemyoffrequentverbs,dieren esin linguisti s systems in dealing with phrasal verbs, light verb onstru tions and others.Thesesynsets,aswellasadje tivalandadverbialsynsets,willbe evalu-ated arefullyinthefuture.Forea hsour e,we he kedwhetheragivenliteral inthegeneratedwordnetsisassignedtheappropriatesynsetida ordingtothe goldstandards.We onsidered onlythoseliterals thatare bothin the goldstan-dardandintheevaluatedresour e.Arandomsampleof100(literal,synset)pairs presentin thea quiredresour ebutabsentinthegoldstandardwereinspe ted byhandand lassiedintothefollowing ategories(seeTable3):

 theliteral is an appropriate expression of the on eptrepresented by that synsetid but is missing from thegoldstandard(absent in GSbut orre t); asmentionedbefore,thegoldstandardsweusedforautomati evaluationare notperfe t and omplete, whi h is whya given literal that was automati- ally assigned to a parti ular synset an be alegitimate literal missing in thegoldstandardrather thanan error;forexample, theFren hliteral do -umentand theSloveneliteral dokumentwere orre tlyaddedin thesynset orrespondingtoPWNliteraldo ument:thissynsetwasabsentfromFren h

(7)

Sour e #of(lit,synsetid)pairsPresentinGS synset Dis repan y notinGS w.r.tGS Fr-Cz-En 1760 61.7% 7.5% 30.8% Fr-Cz-Bg-En 1092 67.8% 4.9% 27.4% Fr-Ro-En 2002 64.7% 8.1% 27.2% Fr-Ro-Cz-En 1206 70.6% 5.4% 24.0% Fr-Ro-Cz-Bg-En 796 75.5% 3.3% 21.2% Wikipedia 368 94.0% 0.3% 5.7% FrWiktionary 577 69.8% 1.0% 29.1% EnWiktionary 365 88.5% - 11.5% Wikispe ies 21 90.5% 4.8% 4.8% EUROVOCdes r. 69 67.6% - 32.3% SloWNet Sl-Cz-En 2084 53.4% 10.9% 35.6% Sl-Cz-Bg-En 1383 59.3% 6.6% 34.1% Sl-Ro-Cz-En 1589 57.7% 8.0% 34.3% Sl-Ro-Cz-Bg-En 1101 61.0% 5.1% 33.9% Table 2.EvaluationofWOLFandSloWNetw.r.t. orrespondinggoldstandard (GS) wordnets(Fren h EWNand ManSloWNet). Results onthe translation approa hfor Slovenearenotshown,be ausetheyarenotstatisti allysigni ant(notenoughdata).

 theliteral is notappropriate for that synset but is semanti allyvery lose toit,itshypernymoritshyponym ( loselyrelated);su h ases anbe on-sideredas orre tifmore oarse-grainedsensegranularityissu ientfora givenappli ation; forexample,itmightsu etotreatwords,su hasekipa (team)andskupina(group)assynonymsinaparti ularHLTtask;

 the literal is neither appropriate nor semanti ally related to the synset in questionbe ause it results from wrong sense disambiguation, wrong word alignmentorwronglemmatization(wrong).

The latter ategory ontains real errors in the generated wordnets. Many of them (around30% in Slovenedata) are related to insu ient sense disam-biguationatthestage of omparing wordnets inother languages.Forexample, the word examination an mean a medi al he k-up. In this ase, the orre t Slovene translation is preiskava. But when the same English word is used for as hoolexam,it should be translatedaspreverjanje znanja, notaspreiskava. However,the latter wasaligned twi e with the English wordexamination and with theCze h wordzkou²ka, whose meaningsin lude s hool exam.This leads to anon-emptyinterse tion ofsynset idsin theSl-Cz-En sour e,whi h assigns the s hool exam synsetto preiskava. Many errors are also the onsequen e of wrong word alignment of the orpus. This happened a lot in ases where the order of onstituentsin noun phrasesin onelanguageissubstantiallydierent from theorderinanotherlanguage.Forexample,theEnglish ompound mem-ber state

head

is always translated in the opposite order as drºava

head

£lani a in Slovene andétat

head

membrein Fren h, and is thus likely to be misaligned.

(8)

Sour e PresentinGSAbsentinGSSour e pre .CloselyrelatedWrong but orre t Fr-Cz-En 61.7% 13.8% 75.5% 10.9% 13.6% Fr-Cz-Bg-En 67.8% 12.4% 80.1% 9.2% 10.7% Fr-Ro-En 64.7% 15.4% 80.1% 8.1% 11.8% Fr-Ro-Cz-En 70.6% 13.3% 84.0% 8.4% 7.6% Fr-Ro-Cz-Bg-En 75.5% 13.2% 88.7% 6.8% 4.5% Wikipedia 94.0% 4.1% 98.1% 0.8% 1.1% FrWiktionary 69.8% 12.2% 82.0% 10.7% 7.2% EnWiktionary 88.5% 6.5% 95.0% 4.0% 1.1% Wikispe ies 90.5% - 90.5% - 9.1% EUROVOCdes r. 67.6% 8.1% 75.7% 16.2% 8.1% SloWNet Sl-Cz-En 53.4% 7.6% 61.0% 4.7% 34.3% Sl-Cz-Bg-En 59.3% 6.8% 66.1% 4.2% 29.7% Sl-Ro-Cz-En 57.7% 7.5% 65.2% 3.8% 31.0% Sl-Ro-Cz-Bg-En 61.0% 7.3% 68.4% 4.0% 27.6% Table 3. Manual evaluation of WOLF and SloWNet and pre ision of BCS synsets a ording to the sour e used for generation. Figures in itali s are to be onsidered arefully,giventhelownumberof(literal,synsetid)pairs.

Thethird sour eof errorsarelemmatizationproblems, mu hmore ommonin SlovenethanFren hbe ausetheSlovenetaggerwastrainedonasmaller orpus. If astrangelemmais guessedby thelemmatization algorithmfor anunknown wordform,itwillmostlikelybelteredoutbythefollowingstagesinoursynset generation pro edure. However, if a word is assigned a wrong but legitimate lemma, it will be treated as a possiblesynonym for a ertain on ept by our algorithm and therefore appear in the wrong synset.Forexample, ifthe word form vode (singular genitiveform of the lemma water) is wrongly lemmatized asvod(Eng.platoon),it willbepla edin allthewatersynsets,whi his a seri-ouserrorthatredu estheusabilityoftheresour e.InFren h,someexpressions withplural anoni alforms,su hasaaires((one's) stu)gotlemmatizedinto singular(aaire,Eng.aair,deal, ase),whi hisinappropriateforthatsynset.

5 Con lusion

Thispaperhaspresentedthetwonewlexi o-semanti resour es(wordnets)that were reatedautomati allyandarefreelyavailableforreuseandextension.The resultsobtained showthat theapproa htaken ispromising and should be ex-ploitedfurtherasityieldsanetworkofwide- overageandquitereliablesynsets

3 that anbeused in manyHLT appli ations.Some issuesare still outstanding, however,su has thegapsin thehirerar hyandwordsenseerrors.

(9)

sour es in a real life setting and is being arriedout. Both wordnets ould be further extended by mapping polysemous Wikipedia entries to PWN with a WSD approa h similar to [15℄.Next,lexi osynta ti patterns ould be used to extra tsemanti allyrelatedwordsfromeitherthe orpus[16℄orWikipedia[17℄. Moreover,Wiktionaries start handling polysemy to some extent, in luding by dierentiatingtranslationsa ordingtosenses denedbyshortgloses.

Referen es

1. Vossen, P. (éd.): EuroWordNet: a multilingual database with lexi al semanti networksfor EuropeanLanguages. Kluwer,Dordre ht(1999)

2. Tu³, D.: Balkanet design and development of a multilingual balkan wordnet. RomanianJournalofInformationS ien eandTe hnology7(12)(2000)

3. Farreres,X.,Rigau,G.,Rodrguez,H.:UsingWordNetforbuildingWordNets.In: Pro eedings of COLING-ACLWorkshop on Usageof WordNetin Natural Lan-guagePro essing Systems,Montreal,Canada(1998)

4. Barbu, E., Mititelu, V.B.: Automati building of Wordnets. In:Pro eedings of RANLP'05,Borovets,Bulgaria(2006)

5. Ja quin, C., Desmontils, E.,, Mon eaux,L.: Pro . of i ling'07(ln s 4394). In: Fren hEuroWordNetLexi alDatabaseImprovements.(2007)

6. Pianta,E.,Bentivogli,L.,Girardi,C.: Multiwordnet:developinganaligned multi-lingualdatabase.In:Pro .ofthe1stGlobalWordNetConf.,Mysore,India(2002) 7. Resnik, P., Yarowsky,D.: A perspe tive onword sensedisambiguationmethods and their evaluation. In: ACL SIGLEX Workshop Tagging Text with Lexi al Semanti s:Why,What,andHow?,Washington,D.C.,UnitedStates(1997) 8. Ide,N.,Erjave ,T.,Tu³,D.:Sensedis riminationwithparallel orpora.In:Pro .

ofACL'02WorkshoponWordSenseDisambiguation. (2002)

9. Diab, M.: The feasibility of bootstrapping anarabi wordnetleveragingparallel orporaand anenglish wordnet. In:Pro .of the Arabi LanguageTe hnologies andResour es.(2004)

10. Fi²er, D.: Leveragingparallel orpora and existing wordnetsfor automati on-stru tionoftheSloveneWordnet. In:Pro .ofL&TC'07, Pozna«,Poland(2007) 11. Fi²er,D.,Sagot,B.: Pro .ofOntolex'08. In:BuildingafreeFren hwordnetfrom

multilingualresour es. (2008)(toappear).

12. Ralf, S.,Pouliquen, B.,Widiger,A., Ignat,C., Erjave , T., Tu³,D.,Varga, D.: TheJRCA quis: Amultilingual alignedparallel orpus with20+ languages. In: Pro .ofLREC'06.(2006)

13. Tiedemann, J.: Combining luesfor wordalignment. In:Pro .ofEACL'03, Bu-dapest,Hungary(2003)

14. Erjave ,T.,Fi²er, D.: Building SloveneWordNet. In:Pro .ofLREC'06, Genoa, Italy(2006)

15. Ruiz-Casado,M., Alfonse a,E., Castells, P.: Automati assignmentofwikipedia en y lopedi entriestowordnetsynsets.In:Pro .ofAdvan esinWebIntelligen e (LNAI3528).(2005)

16. Hearst, M.A.: Automati a quisition of hyponyms from large text orpora. In: Pro .ofCOLING1992,Nantes,Fran e(1992)

17. Ruiz-Casado, M., Alfonse a, E., Castells, P.: Automati extra tion of semanti relationships forwordnetby meansofpatternlearningfrom wikipedia. In:Pro . ofNLDB2005(LNCS3513),Ali ante,Spain(2005)

Figure

Table 1. WOLF and SloWNet synsets. Perentages are given ompared to PWN 2.0.
Table 2. Evaluation of WOLF and SloWNet w.r.t. orresponding goldstandard (GS)
Table 3. Manual evaluation of WOLF and SloWNet and preision of BCS synsets

Références

Documents relatifs

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des

In what follows, the aggregation is formally defined as an opération on languages and the closure of the four families in the Chomsky hierarchy under this opération is investigated

On peut donc dire que, dans le théâtre de Shakespeare et plus particulièrement dans ses tragédies, mais pas uniquement, le kaïros prend le visage d’une fortune hybride,

As for available lexical resources, we had at our disposal Serbian morphological e-dictionaries [Krstev, 2008], Serbian and English wordnets (SrpWN and EWN), and a bilingual

The tests reported here (on Yongning Na and other languages from the Pangloss Collection, an open archive of endangered languages) explore some possibilities for automating the

We consider locally C-flat embeddings in codimension one in three categories C : TOP (continuous maps), LQC (locally quasiconformal maps), and LIP (locally Lipschitz

English is the global language – not just in terms of the number of speakers, but because of its global reach.. It’s the language of the internet, fashion and the

By recognizing that emerging bilingual students engage in higher-order thinking skills and sophisticated practices as they participate in translanguaging, and by identifying the