• Aucun résultat trouvé

The Answer Validation System ProdicosAV

N/A
N/A
Protected

Academic year: 2022

Partager "The Answer Validation System ProdicosAV"

Copied!
9
0
0

Texte intégral

(1)

ChristineJaquin,LauraMoneaux,EmmanuelDesmontils

UniversitédeNantes,LaboratoireLINA,2ruedelaHoussinière,BP92208,44322Nantesedex03,FRANCE

hristine.jaquin,laura.moneaux,emmanuel.desmontilsuniv-nantes.fr

Abstrat

In this paper, we present the ProdiosAV answer validation system whih was de-

veloped by the TALN team from the LINA institute. The system is omposed of a

questionanalysismodule,arankingpassagemodule,ananswerextrationmoduleand

ananswervalidationmodule.

Keywords

Questionanswering,TemporalValidation,AnswerValidation

1 Introdution

Inthis paper, wepresent theProdiosAV systemwhih wasdevelopedbytheTALNteam from

the LINA institute and whih partiipated to the AnswerValidation Exerie for Frenh. This

system is based on the PRODICOS System whih partiipated two years ago to the Question

AnsweringCLEF evaluation ampaign for Frenh. TheProdiosAV systemis omposed offour

modules whose someofthem ome from thePRODICOS system. We presenttheadaptationof

these four modules for the AVE 2008 ampaign and the ProdiosAV results for the AVE 2008

ampaign.

2 Overview of the system arhiteture

TheProdiosAVsystemisdividedintofour parts:

questionanalysismodule;

passagerankingmodule (rankspassagesaordingtotheirabilityto ontaintheanswer);

answerextration module (extrats the andidate answersfrom passages and ranksthem

aordingtotheresultsprovidedbythepreviousmodule).

validationmodule(omparestheanswersgivenbythepreviousmodulewiththoseproposed

bythetestdataandtakesthedeisioniftheanswerisseletedorvalidated orrejeted)

Wepresent,inthenextsetions,thevariousmoduleswhihbelongtotheProdiosAVsystem.

3 Question analysis module

Thequestionanalysis module aims to extratrelevantfeatures from questions that will makeit

possibleto guidethepassagerankingandtheanswersearh. Weextratmanyfeaturesfrom the

questions[7℄:

questiontype

(2)

questionfous

answertype

strategy

Therstandmainfeaturewhihomesfromthequestionanalysisisthequestiontype. Twenty

questiontypesaredened whih orrespond toasimplied syntatiformof thequestion 1

(for

examplethe type QuiVerbeGN).The questiontypewill notonly help to determine the strategy

to perform an answer searh but also it will make it possible to selet rules to extrat other

important features from questions (answer type, question fous). The answer type may be a

namedentity(Person,Loation-State,Loation-City,Organization...),oranumerialentity(Date,

Length,Weight,Finanial-Amount...). Fordetermininganswertype,semantiknowledgeoming

from EuroWordnet Thesaurus[1℄ is used. Listsof wordsare build whih are hyponymsof some

predenedwordswhihareonsideredlikeategoriesandareusedinordertogeneratetheanswer

type[4℄. Thequestionfousorrespondstoawordoragroupofwordinvolvedintothequestion.

Itisgenerallyloatedlosetotheanswerwithinthepassageswhihmayontaintheanswer. The

strategy is ariteriause to searh therightanswer. It is determined aordingto the question

fousandthequestiontype. Thestrategiesavailableareeitherannamedentitystrategy,eithera

numerialentitystrategy,eitheranaronymdenitionstrategyorapattern-basedstrategy. Other

featuresareextratedfromthequestionsinordertoimprovethepassagerankingproess(named

entities,nounphrasesanddates). Forexample,forthequeryQuandAbagelarddeParisest-ilné

? ,"AbagelarddeParis"isonsideredasasingleentity.

For example, if the question is Quand Abagelard de Paris est-il né ?, the analysis of this

questionis:

1. Questiontype: QUAND

2. Answertype: DATE

3. QuestionFous: Abagelardde Paris

4. Strategy: NumerialEntity

5. NamedEntity: "AbagelarddeParis"

4 Passage ranking module

Therole ofthis module isto rankthepassageaording totheirabilityto ontaintheanswerto

thequestion. The strategyused relieson adensity measure. [6℄ madeaquantitativeevaluation

of passageretrieval algorithms for question answering and they showed that systems based on

density measure perform better thanthe ones based onother tehniques. The densitymeasure

approah lies ona soring funtion based onhow lose keywords appear to eah other. In our

evaluation,weonlyuseadensity measureto rankthegivenpassageaordingto seletedwords

fromthequerybutnottoextratpassagefromlargedoument.

Foreahquestion akindof request isbuilt aordingto thedata generated by thequestion

analysis step. The request is omposed of a ombination of elements suh as question fous,

named entities, prinipal verbs, ommon nouns, adjetives, dates and other numerial entities.

These elements are also weighted aording to their importane for determining the potential

answer. Theweightof eah elementdepends on thequestiontypes. Forexample foraquestion

typeequals todate,theoeientassoiateswithadateelementisgreaterthantheonelinked

withaprinipal verbelementoeient. Thedensitymeasure usedisbasedontheonefrom [5℄

butsomeadjustmentweremade.

Forallquerypassage,letm bethenumberofquerytermsbelongingtothepassageandletk

bethenumberofwordsbelongingtothepassage.wgt(

qw i

)istheweightofquerywordi,wgt(

dw i

)

1

(3)

isthedistanebeetweendoumentwordj andthequestionfousquestionFous.

score passage = score 1 + score 2

(1)

score 1 = sum m i=1 wgt ( qw i )

(2)

score 2 = P k − 1

j=1

wgt(dw j )+wgt(questionF ocus) α ∗ dist(j,questionF ocus) 2

k − 1 ∗ m

(3)

Themainadjustmentsmadein omparaisonwith[5℄istheintrodutionofthequestionfous

inthealulusandtheonsiderationofthewholepassageinsteadofonlyseletedsenteneslinked

byanaphora. Infat, [5℄do notonsider the questionfous asanimportant elementbut asan

ordinaryelement. Inourappliation,thedensityisomputedbymainly takingintoaountthe

distane betweenthequestionfousandtheotherquerytermsin thepassage. Itmustbenoted

thatbyexperimentthealphavalueisassignedto0.5. Anotherdiereneisthat[5℄omputethe

score passage

oeientforallsentenesandtheyonlygathertwosentenesintoasamepassageif

forexamplethe seondsenteneontainsananaphoraof anoun belonging tothe rstsentene.

The aim of their system is to provide passage whih probably ontains the answer. But, our

appliation is of dierent nature. The density measure is used in order to rank given passage

aordingtotheirabilitytoontainananswertoaquestion. So,wedonotworkatsentenelevel

butat passagelevel.

5 Answer extration module

Afterthequestionanalysisandthepassageranking,wehavetoextratanswersorrespondingto

questions. Tothis end,weuseontheonehand elementsomingfrom thequestionanalysislike,

forinstane,thequestion'sategory,thestrategytouseit,thenumberofanswers,andsoonand,

ontheotherhand,alistofpassagesevaluatedbyourpreviousstepaordingtothisquestion[4℄.

Thismoduleisdividedintofoursteps:

1. aordingtothequestion'sstrategy,theonveniententityextrationmodule isseleted,

2. andidateanswersaredetetedandseletedbythepreviousseletedmodule,

3. answersareevaluated andtheanswer(s)withthehighesttrustoeientis(are)kept,

4. passageswhere eahanswerhasbeenfoundarealsoassoiatedtotheseletedanswer.

Thequestion'sanalysisangive4groupsofategorieswhihorrespondto4possiblestrate-

gies: numerialentities extration,namedentitiesextration,aronymdenitionsextrationand

pattern-basedextration(thedefault one).

5.1 Numerial entities extration

Forloating numerialentities,weuseaset of dediatedregular expressions. These expressions

makeit possibletothe systemtoextrat numerialinformationnamely: dates, duration,times,

periods,ages,nanialamounts,lengths,weights,numbersandratios. ItusestheMUC(Message

UnderstandingConferene)ategories("TIMEX"and"NUMEX")toannotatetexts.

Inoursystem, wenotedthat referenestodate andtime areslightlyexploitedand theom-

parison of dates is often ompliated. We thus tried to improvethe reognition of dates, their

standardizationandtheirexploitation. Therststageistoloatethereferenestodateandtime.

Numerialvalues, integers,real, and literalones are annotated. Textual elements(days,month,

(4)

programmes de télévision et reçoit duCSA l' autorisation d' émettre sur le satellite TDF 1 en

avril1989. Afterthelabellingphase,thistextbeomes:

<duree type="date">

<mod-pre type="eq">En</mod-pre> <mois type="car">mars</mois>

<annee type="num">1989</annee>

</duree> , La <mois type="car">Sept</mois> devient la Société européenne de programmes de télévision et reçoit du CSA l’ autorisation d’ émettre sur le satellite TDF 1

<duree type="date">

<mod-pre type="eq">en</mod-pre> <mois type="car">avril</mois>

<annee type="num">1989</annee>

</duree> . Elle commence à diffuser ses programmes

<date type="lin">

<mod-pre type="eq">le</mod-pre> <no-jour type="num">30</no-jour> <mois type="car">mai</mois>

<annee type="num">1989</annee>

</date> ;

Onthis result, wean make asome remarks. First of all, are not regardedas "dates" only

thereferenesinludingaday,amonthandayear. Intheontraryase,thisrefereneislabelled

"duration" (durée in frenh). Moreover, we also labelled the artile preeding this referene.

This artile givesinformation onerning the"diretion of time" ompared to the objet of the

sentene. Oursystemoftemporallabellingstillhassomeimperfetions. Inthisexample,itlabels

sept asbeingSeptemberwhereasit orresponds ratherto the integer 7 (the nameof the

televisionhannel; sept meansseven in frenh). This isprodued bytwodierentproesses.

First of all, wehave asystem allowingto reognize the numerialvalues in form literal. Then,

themonthswhosenameislongareoftenshortened. Also,weparameterizedthesystemsothatit

reognizes September butalso sept. or sept. Consequently sept indiates aninteger

butalsoSeptember.

Tofailitate theomparison ofdate,wehose toalulate theelementsofdate(andhour)in

theISO8601form. Forthat,wehavealulated thenumerialvaluesoftheyears,themonths

and the days. Then, we builtthe ISO form of the date. We made the samething with hours.

With thepreeding example,weobtainthen:

<duree type="date" iso8601="1989-03">

<mod-pre type="eq">En</mod-pre> <mois type="car" val="03">mars</mois>

<annee type="num" val="1989">1989</annee>

</duree> , La <mois type="car" val="07">Sept</mois> devient la Société européenne de programmes de télévision et reçoit du CSA l’ autorisation d’ émettre sur le satellite TDF 1

<duree type="date" iso8601="1989-04">

<mod-pre type="eq">en</mod-pre> <mois type="car" val="04">avril</mois>

<annee type="num" val="1989">1989</annee>

</duree> . Elle commence à diffuser ses programmes

<date type="lin" iso8601="1989-05-30">

<mod-pre type="eq">le</mod-pre> <no-jour type="num" val="30">30</no-jour>

<mois type="car" val="05">mai</mois>

<annee type="num" val="1989">1989</annee>

</date> ;

Togoalittlefurther,wealsosoughttoombinedateandhour. Forexample,thedate lundi

17janvier199413h31 ( MondayJanuary17, 199413h31)willbeannotatedinthefollowing

way:

<date-time type="lin" iso8601="1994-01-17T13:31">

<date type="lin" iso8601="1994-01-17">

lundi <no-jour type="num" val="17">17</no-jour> <mois type="car" val="01">janvier</mois>

<annee type="num" val="1994">1994</annee>

</date>

<time type="lin" iso8601="T13:31">

<mod-pre type="eq">à</mod-pre> <heure type="num" val="13">13</heure> h

(5)

<minute type="num" val="31">31</minute>

</time>

</date-time>

Thissystemoflabellingfuntionsbutisto beimproved. Theobjetiveisbettertoloatethe

temporal elementsinthequestionsaswellasin theseletedpassages. Unfortunately,onerning

thisAVE2008evaluation,fewquestionssuggestedwerebasedonthetemporalaspet. Moreover,

theproessingwasundoubtedlyinomplete. It thus didnotproduesigniantresults.

5.2 Named entities extration

For loatingnamed entities,NEMESIS tool[3℄is used. It wasdeveloped byourresearh team.

NemesisisaFrenhpropernamereognizerforlarge-saleinformationextration,whosespeia-

tionshavebeenelaboratedthroughorpusinvestigationbothintermsofreferentialategoriesand

graphialstrutures. Thegraphialriteriaareusedto identifypropernamesandthereferential

lassiationtoategorizethem. Thesystemisalassialone:itisrule-basedandusesspeialized

lexions without any linguisti preproessing. Its originalityonsists on amodular arhiteture

whihinludesalearningproess.

5.3 Aronym denition extration

Foraronym'sdenitionsearh,weuseatooldevelopedbyE.Morin[8℄basedonregularexpres-

sions. It detetsaronymsandlinksthemto theirdenition (ifitexists).

5.4 Pattern-based answer extration

Forthepattern-basedanswerextrationproess,wedevelopedourowntool.

Aordingtoquestionategories,syntatipatternsweredenedinordertoextratanswer(s).

Thesepatternsarebasedonthequestionfousandmakesitpossibletothesystemtoextratthe

answer. Patterns aresortedaordingto theirpriority,i.e. answersextrated byapattern with

an higher priority are onsidered as better answers than theones extrated by patterns with a

lowerpriority.

Asa result,for agivenquestion, patterns assoiated with thequestionategoryare applied

toallpassages. Thus,weobtainaset ofandidateanswersforthisquestion. Patterns(syntati

patterns) are basedon the noun phrasethat ontainsthe fous of the question. Therefore, the

rststeponsistsinonlyseletingpassageswhihouldontaintheanswerandwhihontainthe

fousofthequestion.

5.5 Answer seletion

Whentheanswertype wasdetermined bythequestionanalysis step,the proessextrats, from

thelistofpassagesprovidedbythepreviousstep,theandidateanswers. Namedentities,aronym

denitionsornumerialentitieslosesttothequestionfous(ifthislastisdeteted)aresupported.

Indeed,insuhases,theanswerisoftensituatedlosetothequestionfous.

Theanswerseletionproessdependsonthequestionategory. Fornumerialentities,named

entities and aronym denitions, the rightansweris the one withthe best frequeny. This fre-

quenyis weightedaordingto severalheuristis suh as: the distane (in words) betweenthis

answerand thequestionfous,the presenein thesenteneof named entities ordates from the

question, et. For answersextrated by thepattern-based seletion, twostrategies areused a-

ordingtothequestionategory:

theseletionoftherstseletedanswerobtainedbytherstappliablepattern,

theseletionofthemostfrequentanswer(the andidateanswerfrequeny).

(6)

oneobtainedbytherstappliablepattern(patternssortedaordingto theironveniene)and

intotherstpassage(sortedbythepassageseletionstepaordingtotheironveniene).

Nevertheless,for denitional questions suh asthequestionQui est BorisBeker? ("Who

isBorisBeker?")ortherstquestionQu'estequ'Atlantis? ("What isAtlantis?"),wenoted

thatthebetterstrategyistheandidatephrasefrequeny.

Indeed, for this question ategorywhere the number of question'sterms is low, thepassage

seletionstepdoesnotmakeitpossibletothesystemtoseletwithpreisionpassagesontaining

theanswer. Therefore,thefrequeny-basedstrategygenerallyseletstherightanswer.

6 Validation module

Thevalidationmodule isdividedintotwosteps:

atemporalvalidation

aanswervalidation : omparisonbetweenAVEanswerand PRODICOSanswer

6.1 Temporal validation

Therst step of validation module aims to ompare thetemporal elements of thequestionand

the temporal elementsof the question'spassages. Thetemporal elements are alulatedby the

numerialentitiesextration module,presentedin setion5.1.

Foreahpassage,atemporaloeientisalulated(equal to-2,-1,0,1,2):

Ifthere isnotemporalelementinthequestion: sore=0(nothinganbesaid)

Ifthere aretemporalelementsin thequestion:

Ifthereisnotemporalelementinthepassage: sore=0

Iftherearetemporalelementsbuthighlyontraditory(notsameyearin thequestion

andin thepassage): sore=-2

Iftherearetemporalelementsbut ontraditory(yearnotspeiedin thequestionor

thepassage): sore=-1

Ifthere aretemporal elementsbutnotompleted (sameyear,butdaysormonthsnot

speiedinthequestionorthepassage): sore=1

Ifthereareexatlysametemporalselementsinthequestionandthepassage: sore=

2

Fortherst time, our goalis to usethe temporal oeientto hoose thebest passages for

thequestion: thepassageswhihhaveatemporaloeientequalto1or2areseleted.

IntheAVE 2008task,ourtemporalvalidationmoduleobtainsthefollowingresults:

temporaloeient=2: 9passages

temporaloeient=1: 7passages

temporaloeient=-2: 1passages

temporaloeient=0: 182passages

There are only 17 questions where there are temporal elements in the question and in the

passages for this question. So, thetemporal validation in the AVE 2008 ampaign doesn't give

enoughinformationsto hoosethebest passagestondtheanswer.

(7)

Answertype Numberofanswer

Validated 53

Unknown 20

Rejeted 126

Total 199

6.2 Answer validation

Foreahquestion,ourProdiossystemreturnsananswer. Theanswervalidationaimstoompare,

foreah question,theProdiosanswerwiththeanswerofeah passage.

Theanswervalidationisdividedinseveralsteps:

IftheProdiosansweristhesameanswerthatthepassage'sanwswer: thepassage'sanswer

isvalidated

If theProdiosanswerinludedin thepassage'sanswer: thepassage'sansweris validated

but theondeneoeientisdereased

TheProdiosanswerisnotompletlyinludedinpassage'sanswer(dependsonthenumber

of present words of this answer) : the passage's answer is validated but the ondene

oeientismorethandereased

Otherwisepassage'sanswerisnotvalidated

Ifthereisonlyonepassage'sanswer,thispassage'sansweristheSELECTEDanswer;otherwise

theProdiosAV systemhoose therst answerwhih havethebest ondene oeientasthe

SELECTEDanswer.

7 Results Analysis

Frenh Answer Validation Exerise onsists of 108 questions whih an eah get one or more

andidate answers. There are 199 andidate answers. In this ampaign, systemsare evaluated

in twoways. On theonehand,in orderto evaluate systems,preisionand reallare alulated

aording to all question answers. On the other hand, the seond group of measures aims at

omparingQuestionAnsweringsystemsperformanewiththepotentialgainthatthepartiipant

Answer Validation systems ould add to them. So, the eieny is measured aordingto the

answerthatthesystemprovidesto anuser'squestion.

7.1 System analysis at answers level

All answers(199) given by the system areanalysed aordingto human judgment. It is worth

notingthatthe"unknown"valuegivenbyahumanexperttoananswerisnottakenintoaount

in theevaluation. The humanevaluation results aregivenin table 1. A rst evaluation of the

resultsobtained byProdiosAV are given in table 2. 43 answerswere validated byProdiosAV

andamongthem24arevalidatedtoobyhumanjudges. 136answerswererejetedbyoursystem

and among them 109 are rejeted tooby human experts. Our system obtainsa preision rate

equalto 0.56andareallrateequalto0.46.

Wemade an other evaluation onerning the type of questionand resultsobtained (table 3

andtable4).

Fordenitional question, the test set ontains 46 answers for 29 questions. For 16 answers

(8)

Validated 43 24

Rejeted 136 109

Table2: ProdiosAVevaluation

Validated Validated Validated by

byProdiosAV by human ProdiosAV andhuman

Denition 16 22 11

Numerialentity 7 8 3

Namedentity 22 16 8

Otherqueries 6 7 2

Table3: ProdiosAVevaluation

Rejeted Rejeted

by ProdiosAV by human

Denition 30 15

Numerialentity 21 20

Namedentity 47 47

Otherqueries 50 44

Table4: ProdiosAVevaluation

thesystempreisionishighbutitsreallisworse. Indeed,only16answerswerevalidatedbyour

system while 23 of them should havebeen. The problems ome mainly from questionanalysis

problem(forexamplequestion188: "Vasa"taggedasverb),frompatternextrationproblems(for

examplequestion159: "JaneAusten(16déembre1775,Steventon,Hampshire-18juillet1817,

Winhester)estuneérivain")orfromaronymextrationproblems(themeaningofthearonym

istranslatedinthepassagewhatimpliesthataronym'slettersareompletelyindependantofit).

Fornumerialentityquestion,thetestsetontains28answersfor16questionsandfornamed

entityquestion,thetest setontains69answersfor 36questions. Only11answersof29answers

validatedbyProdiosAVorrespondtothehumanjudgment. Andthesystemreallisalsoworse,

indeedonly11answerswerevalidatedbyoursystemwhile24ofthemshouldhavebeen. Forother

questions,thetest setontains56answersfor27questions. Thesystempreisionandrappelare

equivalent: only2answersof6answersvalidatedbyProdiosAV andof7answersvalidated by

thehumanjudgment. Theproblems omemainly from theseletion oftheandidate passages:

thequestionfousdoesn't detetin alot ofpassages orfrom questionanalysis problemorfrom

referene'sabsene.

7.2 System analysis at questions level

The seond group of measures aims at omparing QA systems performane with the potential

gain that the partiipant Answer Validation systems ould add to them. The test set inludes

108 questions. Human experts nd aresponse for 52 of them. Among them, our system gives

23responseswhiharewellvalidated. Humanexpertsgivesanegativeresponsefor56questions,

amongthem,oursystemgives38negativeresponses. Theqa-auray rateobtainedis22%. The

maximum that the system should haveobtained is 48% (aording to the number of validated

response thehumansfound). Theqa-rej-auray rateobtainedis35%. Themaximumthat the

system should haveobtained is 52%(aording to the numberof rejeted response the humans

found). Our systemobtainsanotsatisfatoryestimated-qa-performane rateequalto 29%. The

(9)

onehasbeenfound.

8 Conlusion and Prospets

Theresultsarenotsatisfatory,beauseweonlyreover24orretanswerson53orretanswers.

The problems ome mainly from question analysis problem (speialy the word's labeling), from

pattern extration problem (speialy the absene of semantis and oreferene). We an also

improveour validation module by taking into aountall the answersproposed by the various

strategiesofextrationmoduleandnotonlythebest. Weanalsotakeintoaounttheanother

externalinformationsasthepassage'sdate,et.

Referenes

[1℄ VossenP.: EuroWordNet: AMultilingualDatabasewithLexialSemanti,editorNetworks

PiekVossen,universityofAmsterdam, 1998.

[2℄ MoneauxL.: Adaptationduniveaud'analysedesinterventionsdansundialogue-appliation

àunsystèmedequestion-réponse,Theseeninformatique,ParisSud,ORSAY,LIMSI(2003)

[3℄ Fourour,N.: Identiationetatégorisationautomatiquesdesentitésnomméesdanslestextes

français,Theseeninformatique,Nantes,LINA(2004)

[4℄ DesmontilsE., JaquinC.,MoneauxL.: QuestionTypesSpeiation fortheUseof Spe-

ialized Patternsin ProdiosSystem,in CPeters,F.CGey,JGonzalo,G.JJones,MKluk,

B Magnini, H Müllern et M De Ruke eds. Proeedings of Aessing Multilingual Reposito-

ries, 7th workshop of the Cross Language Evaluation Forum, CLEF 2006, Revised seleted

papers,volume4730deLeturesNotesofComputerSienes(LNCS), Springer-VerlagBerlin

Heidelberg,pp.280-289,2007.

[5℄ LeeG.G.,SeoJ.,LeeS.,JungH., ChoB.,LeeC.,KwakB.,ChaJ.,KimD.,AnJ.,KimH.,

Kim K.: "SiteQ: EngineeringHigh Performane QA SystemUsing Lexio-SemantiPattern

MathingandShallowNLP". Proeedingsoftenth Text REtrievalConferene(TREC 2001),

2001.

[6℄ S. TellexS., KatzB., LinJ., Fernandes A., MartonG.: "Quantitativeevaluation ofpassage

retrievalalgorithmsforquestionanswering",InSIGIRonfereneonResearhanddevelopment

in informationretrieval,pages4147.ACM Press,2003.

[7℄ L. Moneaux, C. Jaquin, E. Desmontils : The query answering systemProdios, in C Pe-

ters, F.C Gey, J Gonzalo, G.J Jones, M Kluk, B Magnini, H Müllern et M De Rukeeds.

ProeedingsofAessingMultilingualRepositories,6thworkshopoftheCrossLanguageEval-

uationForum,CLEF2005,Revisedseletedpapers,Vienna,Austria,september2005,volume

4022deLeturesNotesofComputerSienes(LNCS),Springer-VerlagBerlinHeidelberg,pp.

527?534,2006

[8℄ E. Morin, "Extration de liens sémantiques entre termes ápartir de orpus de textes teh-

niques", PhD Thesis, Université de Nantes, LINA, De'eembre 1999. http://www.sienes.univ-

nantes.fr/info/perso/permanents/morin/artile/morin-these99.pdf

Références

Documents relatifs

In the framework of this thesis, it was required to develop and evaluate methods for extracting named entities in texts on the topic of information security using artificial

Our system learns a CRF model from the training data, based on a rich feature set that combines external knowledge sources with information gathered from the EMR text.. We discuss

The propose system has obtained better results than a baseline system that always accepts all answers (return YES in all pairs), what means that the entailment between entities can

We report the evaluation results for the extraction of the linked named entities on two sections of two yearbooks: the CAPITAL and CONSTITUTION sections of the French Desfoss´ e

These are LSTM recurrent neural networks, the CRF classifier and language models based on Word Embeddings and Flair Embeddings.. 3.1

The semantic similarity of words is exploited by these methods, providing vectors close in space when their related words are close in meaning.. These vectors are usually calculated

It proves that the generated entity embedding contains valid semantic information, which can be used to compare the similarity with the context of mentions, so as to complete the

We present a supervised Clinical named Entity Recognition system that can detect the named entities from a French medical data, using an extensive list of features, with an F1 Score