• Aucun résultat trouvé

FIDJI in ResPubliQA 2009

N/A
N/A
Protected

Academic year: 2022

Partager "FIDJI in ResPubliQA 2009"

Copied!
8
0
0

Texte intégral

(1)

XavierTannier

CNRS-LIMSI

UniversityParis-Sud11

xtannierlimsi.fr

VéroniqueMorieau

CNRS-LIMSI

UniversityParis-Sud11

morieaulimsi.fr

Abstrat

ThispaperpresentsFIDJIresultsinResPubliQA2009. FIDJI(FindingInDouments

JustiationsandInferenes)isanopen-domainquestion-answeringsystemforFrenh.

Themaingoalistovalidateanswersbyhekingthatalltheinformationgiveninthe

questionareretrievedinthesupportingtexts.

Categories and Subjet Desriptors

H.3[InformationStorage and Retrieval℄: H.3.1ContentAnalysisandIndexing;H.3.3Infor-

mationSearhandRetrieval;H.3.4SystemsandSoftware;H.3.7DigitalLibraries;H.2.3[Database

Managment℄: LanguagesQueryLanguages

General Terms

Measurement,Performane,Experimentation

Keywords

Questionanswering,Questionsbeyondfatoids

1 Introdution

ThispaperpresentsFIDJI'sresultsinResPubliQA2009forFrenh. Inthistask,systemsreeive

500independentquestionsinnaturallanguageasinput,andmustreturnoneparagraphontaining

theanswerfromthedoumentolletion. Noexatanswerisrequiredneithermultipleresponses.

ThedoumentolletionisJRC-AquisaboutEU doumentation.

2 FIDJI

FIDJI 1

(FindingInDoumentsJustiationsandInferenes)isanopen-domainquestion-answering

systemforFrenh. Themaingoalistovalidateanswersbyhekingthatalltheinformationgiven

inthequestionareretrievedinthesupportingtexts. Ouranswervalidationapproahassumesthat

thedierententitiesofthequestionanberetrieved,properlyonneted,eitherinasentene,ina

passageorinmultipledouments. Wedesignedthesystemsothatnopartiularlinguisti-oriented

pre-proessingisneeded.

ThedoumentolletionisindexedbythesearhengineLuene 2

[2℄. First,thesystemsubmits

the keywordsof the question to Luene: the rst 100 douments are then proessed (syntati

1

ThisworkhasbeenpartiallynanedbyOSEOundertheQuaeroprogram.

2

(2)

analysisandnamedentitytagging). Amongthesedouments,FIDJIlooksforsentenesontaining

themostsyntatirelationsofthequestion. Finally,answersareextrated fromthese sentenes

andtheanswertype,whenspeiedinthequestion,isvalidated.Figure1presentsthearhiteture

ofFIDJIandmoredetailsanbefoundin[4,3℄. NextsetionssummarizethewayFIDJIextrat

answersandfouseonResPubliQA speiities.

2.1 Syntati analysis

FIDJIhastodetetsyntatiimpliationsbetweenquestionsandpassagesontainingtheanswers.

OursystemreliesonsyntatianalysisprovidedbyXIP,whihisusedtoparseboththequestions

andthedoumentsfromwhih answersareextrated.

XIP[1℄isarobustparserforFrenhandEnglishwhihprovidesdependenyrelationsand

namedentityreognition. ThedependenyrelationsprovidedbyXIPwhihareusedbyFIDJIare

mainly: SUBJ(subjet),OBJ(objet),PREPOBJ(prepositionalgroup),NMOD(nounmodier),

VMOD(verbmodier),COORDITEMS(oordinatedelements)CONNECT(onnetorintrodu-

inglause).

Thenamed entities(NE)are taggedusing aset of8types: person,organization, loation,

date(dened byXIP),aswellasnationality,number,duration,age(thatweadded). XIP'slieu

(loation) an be made morespei (ountry, region, ontinent...). We also added features to

allowformorepreise types. Forexample,for number,weadded thefollowingfeatures: length,

speed, weight, money, physis, so that 0.55 euro in a Frenh stamp osts 0.55 euro an be

tagged as aNE and extrated asan answer to What is the prie of a Frenh stamp?. Other

(3)

Questionanalysisonsistsin identifying:

ThesyntatidependeniesgivenbyXIP;

ThekeywordssubmittedtoLuene(wordstaggedasnoun,verbadjetiveoradverbbyXIP);

Thequestiontype:

Fatoid(onerningafat,typiallywho,when,wherequestions),

Denition (Whatis...),

Boolean(expetingayes/noanswer),

List(expeting anansweromposedofalistofitems),

Complexquestions(whyandhowquestions).

Theexpetedtype(s): NE typeand/or(spei) answertype.

Theanswerto beextratedisrepresentedbyavariable(ANSWER)introduedinthedepen-

denyrelations. Theslotnoted'ANSWER'isexpetedtobeinstantiatedbyaword,argumentof

somedependeniesoftheparsedsentenes. Thiswordrepresentstheanswertothequestion(see

Setion 2.2). The question typeis mainly determined on the basisof the dependeny relations

givenbytheparser. Forexample:

0015- Entrequels pays aété onlul'aord-adre de oopération ommerialeet éonomique du

2avril 1990?

(Between whih ountries is the Framework Agreement for trade and eonomi ooperation of 2

April 1990?)

SyntatidependeniesandNEtagging:

ATTRIBUTADJ(oopération, ommerial) ATTRIBUTADJ(oopération, éonomique)

ATTRIBUT_DE(aord-adre, oopération) VMOD(onlure, ANSWER)

PREPOBJ(ANSWER, entre) ATTRIBUT(onlure, aord-adre)

DATE(2 avril 1990) LIEU[PAYS℄(ANSWER)

Questiontype: list

Expetedtype: loation(state)

0021-Commentenourage-t-onla produtionde graines devers àsoie ?

(How isinterestin produing silkwormeggsinreased?)

SyntatidependeniesandNEtagging:

ATTRIBUT_DE(graine, vers) ATTRIBUT_DE(prodution, graine)

DEEPOBJ(enourager, prodution) NMOD(vers, soie)

TOPIC(enourager)

Questiontype: omplex

Expetedtype:

(4)

ResPubliQA answerformat is dierent from traditional QA ampaigns. First, answers are not

foused,shortparts oftexts, butfullparagraphsthatmustontaintheanswer. Seond,passages

arenotindenite parts of textsof limitedlength;theymust bepredenedparagraphsidentied

intheolletionbyXML tags<p>.

Althoughanswersto submit to the ampaign are full paragraphs,our system is designedto

huntdownshortanswers.Formostquestions,typiallyfatoidquestions,itisstillrelevanttond

shortanswers,andthentoreturnaparagraphontainingthebestanswer. Thisisnottheaseof

'how'or'why'questions,wherenoshortanswermayberetrieved.

FIDJIusually worksat sentenelevel. Fortheaim ofResPubliQA spei rules,wehoseto

workatparagraphlevel. Thisonsistedinspeifyingthatsenteneseparatorswere<p>XMLtags

intheolletion,ratherthanusualend-of-sentenemarkers.

Oneandidatedoumentsare seletedbythesearh engineandanalyzedbytheparser, the

systemomparesthedoumentparagraphswithquestionanalysis,inorder to:

Extratandidateanswersorseletarelevantparagraph;

Giveasoreto eahanswer,sothatnal answersanberanked.

2.2.1 Fatoidquestions

Within seleted douments, andidate paragraphs are those ontaining the most dependenies

fromthequestion. One theseparagraphsareseleted,twoasesanour:

1. Question dependenieswithan'ANSWER'slotarefoundin thesentene. Inthisase,the

lemma instantiatingthis slotistheheadoftheanswer. Thefullanswerisomposed ofthe

headanditsbasimodiers(foranounphrase: nounomplements,adjetives,determiners

and oordinated elements; for a verbal phrase: verb omplements, subjet and objet).

The eventual NE type and answer type of this answer are heked. Answer type an be

validatedbydierentsyntatirelationsinthetext: denition("TheFrenhPrimeminister,

Pierre Bérégovoy"), attributNN ("Pierre Bérégovoy is the Frenh Prime minister"), and

sometimesattribut_de("lamaladiedeParkinson",Parkinson'sdesease,literally"thedisease

ofParkinson").

2. The'ANSWER'slotdoesnotunifywithanywordofthepassage. Inthisase,theelements

havinganappropriateNEtypeand/oranswertypeareseletedinthesentene. Thisisdone

in order to ounterbalane themany parsingerrors (or paraphrases). Often, the sentene

ontainstheanswerbut syntatidependeniesalonedonotleadtoit.

Ifnopossibleshort answeris found, theparagraphis stillonsidered asaandidate answer.

Butinanyase,aparagraphontaininganextratedshort answerwill bepreferedifitexists.

Example1.

0015- Entrequels pays aété onlul'aord-adre de oopération ommerialeet éonomique du

2avril 1990?

(Between whih ountries is the Framework Agreement for trade and eonomi ooperation of 2

April 1990?)

SyntatidependeniesandNEtagging:

ATTRIBUTADJ(oopération, ommerial) ATTRIBUTADJ(oopération, éonomique)

ATTRIBUT_DE(aord-adre, oopération) ATTRIBUT(onlure, aord-adre)

VMOD(onlure, ANSWER) PREPOBJ(ANSWER, entre)

DATE[DATEABS℄(2 avril 1990) LIEU[PAYS℄(ANSWER)

Questiontype: list

(5)

Expetedtype: loation(state)

Thefollowingpassageisseletedbeauseitontainsthedependeniesofthequestion:

Passage:unaord-adredeoopérationommerialeetéonomiqueentrelaCommunautééonomique

européenneetla République argentine(3) aétéonlule2avril1990 ;

(ConsideringtheFrameworkAgreementfor tradeandeonomiooperation betweentheEuropean

Eonomi Communityandthe ArgentineRepubli of2April 1990;)

ATTRIBUTADJ(oopération, ommerial) ATTRIBUTADJ(oopération, éonomique)

ATTRIBUT_DE(aord-adre, oopération) ATTRIBUT(onlure, aord-adre)

NMOD(oopération, ommunauté éonomique européen)

PREPOBJ(ommunautééonomique européen, entre)

COORDITEMS(ommunautééonomique européen, républiqueargentin)

LIEU[PAYS ℄(république argentin)

DATE(2 avril 1990)

ORG(ommunauté éonomique européen)

Theslot 'ANSWER' isinstantiatedby ommunautééonomique européenne. Asthequestion

typeis'list', the elementsof thelisthasto befoundin a'COORDITEMS' dependeny: so,the

answersareommunautééonomique européenneand république argentine. Finally,theexpeted

answertypeisvalidated: theseletedansweristaggedasaloation(state).

Example2.

0026-Quelest lenomde la monnaiedesétatsmembresdepuisle1erjanvier 1999 ?

(Whatisthe name ofthe memberstates' urreny from1January 1999?)

SyntatidependeniesandNEtagging:

ATTRIBUT_DE(monnaie, état) NMOD(état, membre)

PREPOBJ(1er janvier 1999, depuis) DEFINITION(ANSWER, monnaie)

DATE(1er janvier 1999)

Questiontype: denition

Expetedtype:

Thefollowingpassageisseletedbeauseitontainsallthedependeniesofthequestion:

Passage: onsidérant que le règlement (CE) n 974/98 du Conseil du 3 mai 1998 onernant

l'introdutiondel'euro(3)prévoitàsonartile2que,àompterdu1erjanvier1999,la monnaie

desÉtatsmembrespartiipantsest l'euro;

(WhereasCounilRegulation(EC)No974/98 of3May1998 onthe introdutionofthe euro(3),

provides in Artile 2 that from 1 January 1999 the urreny of the partiipating Member States

shallbethe euro)

ATTRIBUTADJ(membre, partiipant) ATTRIBUT_DE(monnaie, état)

NMOD(état, membre) PREPOBJ(1er janvier 1999, à ompter de)

DEFINITION(euro, monnaie) DATE(1er janvier 1999)

...

(6)

Complexquestions ('how', 'why', et.) donotexpet anyshort answer. Onthesekindsof ques-

tions,thesystembehavesmoreasapassageretrievalsystem. Theparagraphsontainingthemore

syntatidependeniesinommonwiththequestionareseleted. Amongthem, thebest-ranked

istheonethatis returnedrstbyLuene. Forexample:

0155-Pourquoi onvient-ilde revoirl'arhiteture duréseauAnimo?

(Why shouldthe strutureof anANIMOnetwork berevised?)

SyntatidependeniesandNEtagging:

VMOD(onvenir, revoir) DEEPOBJ(revoir, arhiteture)

ATTRIBUT_DE(arhiteture, réseau) NMOD(réseau, animo)

Questiontype: omplex (why)

Expetedtype:

Thefollowingpassageisseletedbeauseallthedependeniesofthequestionarefoundinthe

passage:

Passage: onsidérantque, àla suitede diérents travaux eetuésdansle adreommunautaire,

notamment lors d'études et de séminaires, il onvient de revoir l'arhiteture du réseau Animo

an de proéderàla miseen plae d'unsystème vétérinaireintégrant lesdiérentes appliations

informatisées;

(Whereas, as a result of the work arried out at Community level in the ourse of studies and

seminars, the struture of the ANIMO network should be revised so that a veterinary system

integratingthe various omputerappliations anbe introdued;)

DEEPSUBJ(onvenir, il) VMOD(onvenir, revoir)

DEEPOBJ(revoir, arhiteture) ATTRIBUT_DE(arhiteture, réseau)

NMOD(réseau, animo)

PREPOBJ(proéder, afin de) VMOD(proéder, mise)

PREPOBJ(mise, à) NMOD(mise, plae)

...

2.3 Soring

FIDJI'ssoresarenotomposedof asinglevalue,but ofalistof dierentvaluesandags. The

riteriaare listedbelow,andarepresentedindereasingorderofimportane:

Aswesaid,aparagraphontaininganextratedshortanswerwill bepreferedifitexists.

Namedentityvalue(appropriateNEvalueornotonlyforfatoidquestions).

Keywordrate(between0and1,therateofquestionmajorkeywordspresentinthepassage:

propernames,answertypeandnumbers).

Answertypevalue(appropriateanswertypeornotonlyforfatoid questions).

Frequeny weighting (number of extrated ourrenes of this answer only for fatoid

questions).

Doumentranking(bestrankofadoumentontainingtheanswer,asreturnedbythesearh

(7)

WepresenttheresultsTable1bytypesofquestions. Onlyoneanswerperquestionwasallowed,

sothevaluessimplyorrespondto therateoforretanswersforeahquestiontype.

Questiontype Numberofquestions Corret answer

Fatoid 116 36.2%

Denition 101 15.8%

List 37 16.2%

"How" 76 22.4%

"Why" 170 40%

TOTAL 500 30.4%

Table1: FIDJIresultsbyquestiontypes.

Resultsarelowerthanformerampaigns'sores,espeiallyonerning fatoidand denition

questions.

Looking arrefully at the resultsshows that, in these partiular douments, using syntati

dependeniesasthemainluetohooseparagraphandidatesisnotalwaysagoodwaytondout

arelevantpassage. Thisisespeiallytrueforomplexquestions,butnotonly. Indeed,theseletion

oftheparagraphontainingthemostquestiondependeniesoftenleadstotheintrodutionofthe

doumentortoaverygeneralparagraphontainingpoorinformation.

Forexample:

0006-What isthe sope ofthe ounil diretive onthe tradingof fodder seeds?

isansweredby

<p>COUNCILDIRECTIVEof 14June 1966 onthe marketingof fodder plant seed

(66/401/EEC)</p>

ontaining many dependenies but answering nothing, while a good result was later in the

samedoument,butwithananaphora:

<p>ThisDiretive shallapply tofodder plant seedmarketedwithin the Community, irrespe-

tive ofthe usefor whih the seedas grownisintended.</p>

Dependenyrelationsarestillusefultondthegooddoument,butoftenfailstopointoutto

theorretparagraph.

Also, JRC-Aquis orpus uses a dierent register of language than usual orpora suh a Web

ornewspapers. Question aswellasdoumentanalysessuered fromthespeiexpressionsand

struturesusedbyFrenhtexts,andespeiallyfordenitions. Denitions,quiteeasytodetetin

newspaperorpora,havebeenpoorlyreognizedforthisevaluation.

4 Conlusion

We presented in this artileour partiipation to theampaign resPubliQA 2009in Frenh. We

adapted oursyntati-based QA system FIDJI in order to produe a single longanswerin the

formofJRC-Aquistagged paragraphs. Resultsshowedthatsyntatianalysisshouldbeusedin

dierent manners aording to the typeof tasks and questions. A arefullook at our system's

(8)

[1℄ SalahAït-Mokhtar andJean-PierreChanod. Inremental nite-stateparsing. InProeedings

of the fthonfereneon Applied naturallanguage proessing, pages7279,Washington,DC,

USA,1997.MorganKaufmannPublishersIn.,SanFraniso,California,USA.

[2℄ ErikHatherandOtisGospodneti¢. Luene inAtion. Manning,2004.

[3℄ VéroniqueMorieauandXavierTannier. Étudedel'apportdelasyntaxedansunsystèmede

question-réponse. InAtesde la Conférene TraitementAutomatiquedes Langues Naturelles

(TALN2009,poster),Senlis,Frane,jun 2009.

[4℄ VéroniqueMorieau,XavierTannier,andBrigitteGrau.Utilisationdelasyntaxepourvalider

les réponses àdes questions par plusieursdouments. In Proeedings of workshop on COn-

féreneenReherhed'InformationetAppliations,CORIA,Presqu'îledeGiens,Frane,2009.

Références

Documents relatifs

9.In 16---, the Pilgrim fathers , the Puritans established the

2- Find in the text words that are opposite to the meaning of the following:. *Asked

Check answers by asking diff erent pairs to read out one sentence and its answer.. Listen and write the

➀ : AS task with syntati modules turned on (exat answers judged

We focused our work on developing a question analyser module that used SPARQL queries over DB- pedia and the Wiki WGO 2009, our geographic ontology, as a means to get answers

This patient should be referred to an ophthalmologist on  a  nonurgent  basis.  The  patient’s  visual  acuity  and 

These  entities  can  usually  be  differentiated  from  LSC  on  the  basis  of  morphology  and  distribution.  A  skin  biopsy  might  be  beneficial  in 

Foundations of Descriptive and Inferential Statistics October 2015 - Continuous assessment - Semester 1 Time allowed : 1h30 - All documents allowed.. Exercice 1 For each