XavierTannier
CNRS-LIMSI
UniversityParis-Sud11
xtannierlimsi.fr
VéroniqueMorieau
CNRS-LIMSI
UniversityParis-Sud11
morieaulimsi.fr
Abstrat
ThispaperpresentsFIDJIresultsinResPubliQA2009. FIDJI(FindingInDouments
JustiationsandInferenes)isanopen-domainquestion-answeringsystemforFrenh.
Themaingoalistovalidateanswersbyhekingthatalltheinformationgiveninthe
questionareretrievedinthesupportingtexts.
Categories and Subjet Desriptors
H.3[InformationStorage and Retrieval℄: H.3.1ContentAnalysisandIndexing;H.3.3Infor-
mationSearhandRetrieval;H.3.4SystemsandSoftware;H.3.7DigitalLibraries;H.2.3[Database
Managment℄: LanguagesQueryLanguages
General Terms
Measurement,Performane,Experimentation
Keywords
Questionanswering,Questionsbeyondfatoids
1 Introdution
ThispaperpresentsFIDJI'sresultsinResPubliQA2009forFrenh. Inthistask,systemsreeive
500independentquestionsinnaturallanguageasinput,andmustreturnoneparagraphontaining
theanswerfromthedoumentolletion. Noexatanswerisrequiredneithermultipleresponses.
ThedoumentolletionisJRC-AquisaboutEU doumentation.
2 FIDJI
FIDJI 1
(FindingInDoumentsJustiationsandInferenes)isanopen-domainquestion-answering
systemforFrenh. Themaingoalistovalidateanswersbyhekingthatalltheinformationgiven
inthequestionareretrievedinthesupportingtexts. Ouranswervalidationapproahassumesthat
thedierententitiesofthequestionanberetrieved,properlyonneted,eitherinasentene,ina
passageorinmultipledouments. Wedesignedthesystemsothatnopartiularlinguisti-oriented
pre-proessingisneeded.
ThedoumentolletionisindexedbythesearhengineLuene 2
[2℄. First,thesystemsubmits
the keywordsof the question to Luene: the rst 100 douments are then proessed (syntati
1
ThisworkhasbeenpartiallynanedbyOSEOundertheQuaeroprogram.
2
analysisandnamedentitytagging). Amongthesedouments,FIDJIlooksforsentenesontaining
themostsyntatirelationsofthequestion. Finally,answersareextrated fromthese sentenes
andtheanswertype,whenspeiedinthequestion,isvalidated.Figure1presentsthearhiteture
ofFIDJIandmoredetailsanbefoundin[4,3℄. NextsetionssummarizethewayFIDJIextrat
answersandfouseonResPubliQA speiities.
2.1 Syntati analysis
FIDJIhastodetetsyntatiimpliationsbetweenquestionsandpassagesontainingtheanswers.
OursystemreliesonsyntatianalysisprovidedbyXIP,whihisusedtoparseboththequestions
andthedoumentsfromwhih answersareextrated.
XIP[1℄isarobustparserforFrenhandEnglishwhihprovidesdependenyrelationsand
namedentityreognition. ThedependenyrelationsprovidedbyXIPwhihareusedbyFIDJIare
mainly: SUBJ(subjet),OBJ(objet),PREPOBJ(prepositionalgroup),NMOD(nounmodier),
VMOD(verbmodier),COORDITEMS(oordinatedelements)CONNECT(onnetorintrodu-
inglause).
Thenamed entities(NE)are taggedusing aset of8types: person,organization, loation,
date(dened byXIP),aswellasnationality,number,duration,age(thatweadded). XIP'slieu
(loation) an be made morespei (ountry, region, ontinent...). We also added features to
allowformorepreise types. Forexample,for number,weadded thefollowingfeatures: length,
speed, weight, money, physis, so that 0.55 euro in a Frenh stamp osts 0.55 euro an be
tagged as aNE and extrated asan answer to What is the prie of a Frenh stamp?. Other
Questionanalysisonsistsin identifying:
•
ThesyntatidependeniesgivenbyXIP;•
ThekeywordssubmittedtoLuene(wordstaggedasnoun,verbadjetiveoradverbbyXIP);•
Thequestiontype:Fatoid(onerningafat,typiallywho,when,wherequestions),
Denition (Whatis...),
Boolean(expetingayes/noanswer),
List(expeting anansweromposedofalistofitems),
Complexquestions(whyandhowquestions).
•
Theexpetedtype(s): NE typeand/or(spei) answertype.Theanswerto beextratedisrepresentedbyavariable(ANSWER)introduedinthedepen-
denyrelations. Theslotnoted'ANSWER'isexpetedtobeinstantiatedbyaword,argumentof
somedependeniesoftheparsedsentenes. Thiswordrepresentstheanswertothequestion(see
Setion 2.2). The question typeis mainly determined on the basisof the dependeny relations
givenbytheparser. Forexample:
0015- Entrequels pays aété onlul'aord-adre de oopération ommerialeet éonomique du
2avril 1990?
(Between whih ountries is the Framework Agreement for trade and eonomi ooperation of 2
April 1990?)
•
SyntatidependeniesandNEtagging:ATTRIBUTADJ(oopération, ommerial) ATTRIBUTADJ(oopération, éonomique)
ATTRIBUT_DE(aord-adre, oopération) VMOD(onlure, ANSWER)
PREPOBJ(ANSWER, entre) ATTRIBUT(onlure, aord-adre)
DATE(2 avril 1990) LIEU[PAYS℄(ANSWER)
•
Questiontype: list•
Expetedtype: loation(state)0021-Commentenourage-t-onla produtionde graines devers àsoie ?
(How isinterestin produing silkwormeggsinreased?)
•
SyntatidependeniesandNEtagging:ATTRIBUT_DE(graine, vers) ATTRIBUT_DE(prodution, graine)
DEEPOBJ(enourager, prodution) NMOD(vers, soie)
TOPIC(enourager)
•
Questiontype: omplex•
Expetedtype:∅
ResPubliQA answerformat is dierent from traditional QA ampaigns. First, answers are not
foused,shortparts oftexts, butfullparagraphsthatmustontaintheanswer. Seond,passages
arenotindenite parts of textsof limitedlength;theymust bepredenedparagraphsidentied
intheolletionbyXML tags<p>.
Althoughanswersto submit to the ampaign are full paragraphs,our system is designedto
huntdownshortanswers.Formostquestions,typiallyfatoidquestions,itisstillrelevanttond
shortanswers,andthentoreturnaparagraphontainingthebestanswer. Thisisnottheaseof
'how'or'why'questions,wherenoshortanswermayberetrieved.
FIDJIusually worksat sentenelevel. Fortheaim ofResPubliQA spei rules,wehoseto
workatparagraphlevel. Thisonsistedinspeifyingthatsenteneseparatorswere<p>XMLtags
intheolletion,ratherthanusualend-of-sentenemarkers.
Oneandidatedoumentsare seletedbythesearh engineandanalyzedbytheparser, the
systemomparesthedoumentparagraphswithquestionanalysis,inorder to:
•
Extratandidateanswersorseletarelevantparagraph;•
Giveasoreto eahanswer,sothatnal answersanberanked.2.2.1 Fatoidquestions
Within seleted douments, andidate paragraphs are those ontaining the most dependenies
fromthequestion. One theseparagraphsareseleted,twoasesanour:
1. Question dependenieswithan'ANSWER'slotarefoundin thesentene. Inthisase,the
lemma instantiatingthis slotistheheadoftheanswer. Thefullanswerisomposed ofthe
headanditsbasimodiers(foranounphrase: nounomplements,adjetives,determiners
and oordinated elements; for a verbal phrase: verb omplements, subjet and objet).
The eventual NE type and answer type of this answer are heked. Answer type an be
validatedbydierentsyntatirelationsinthetext: denition("TheFrenhPrimeminister,
Pierre Bérégovoy"), attributNN ("Pierre Bérégovoy is the Frenh Prime minister"), and
sometimesattribut_de("lamaladiedeParkinson",Parkinson'sdesease,literally"thedisease
ofParkinson").
2. The'ANSWER'slotdoesnotunifywithanywordofthepassage. Inthisase,theelements
havinganappropriateNEtypeand/oranswertypeareseletedinthesentene. Thisisdone
in order to ounterbalane themany parsingerrors (or paraphrases). Often, the sentene
ontainstheanswerbut syntatidependeniesalonedonotleadtoit.
Ifnopossibleshort answeris found, theparagraphis stillonsidered asaandidate answer.
Butinanyase,aparagraphontaininganextratedshort answerwill bepreferedifitexists.
Example1.
0015- Entrequels pays aété onlul'aord-adre de oopération ommerialeet éonomique du
2avril 1990?
(Between whih ountries is the Framework Agreement for trade and eonomi ooperation of 2
April 1990?)
•
SyntatidependeniesandNEtagging:ATTRIBUTADJ(oopération, ommerial) ATTRIBUTADJ(oopération, éonomique)
ATTRIBUT_DE(aord-adre, oopération) ATTRIBUT(onlure, aord-adre)
VMOD(onlure, ANSWER) PREPOBJ(ANSWER, entre)
DATE[DATEABS℄(2 avril 1990) LIEU[PAYS℄(ANSWER)
•
Questiontype: list•
Expetedtype: loation(state)Thefollowingpassageisseletedbeauseitontainsthedependeniesofthequestion:
Passage:unaord-adredeoopérationommerialeetéonomiqueentrelaCommunautééonomique
européenneetla République argentine(3) aétéonlule2avril1990 ;
(ConsideringtheFrameworkAgreementfor tradeandeonomiooperation betweentheEuropean
Eonomi Communityandthe ArgentineRepubli of2April 1990;)
ATTRIBUTADJ(oopération, ommerial) ATTRIBUTADJ(oopération, éonomique)
ATTRIBUT_DE(aord-adre, oopération) ATTRIBUT(onlure, aord-adre)
NMOD(oopération, ommunauté éonomique européen)
PREPOBJ(ommunautééonomique européen, entre)
COORDITEMS(ommunautééonomique européen, républiqueargentin)
LIEU[PAYS ℄(république argentin)
DATE(2 avril 1990)
ORG(ommunauté éonomique européen)
Theslot 'ANSWER' isinstantiatedby ommunautééonomique européenne. Asthequestion
typeis'list', the elementsof thelisthasto befoundin a'COORDITEMS' dependeny: so,the
answersareommunautééonomique européenneand république argentine. Finally,theexpeted
answertypeisvalidated: theseletedansweristaggedasaloation(state).
Example2.
0026-Quelest lenomde la monnaiedesétatsmembresdepuisle1erjanvier 1999 ?
(Whatisthe name ofthe memberstates' urreny from1January 1999?)
•
SyntatidependeniesandNEtagging:ATTRIBUT_DE(monnaie, état) NMOD(état, membre)
PREPOBJ(1er janvier 1999, depuis) DEFINITION(ANSWER, monnaie)
DATE(1er janvier 1999)
•
Questiontype: denition•
Expetedtype:∅
Thefollowingpassageisseletedbeauseitontainsallthedependeniesofthequestion:
Passage: onsidérant que le règlement (CE) n 974/98 du Conseil du 3 mai 1998 onernant
l'introdutiondel'euro(3)prévoitàsonartile2que,àompterdu1erjanvier1999,la monnaie
desÉtatsmembrespartiipantsest l'euro;
(WhereasCounilRegulation(EC)No974/98 of3May1998 onthe introdutionofthe euro(3),
provides in Artile 2 that from 1 January 1999 the urreny of the partiipating Member States
shallbethe euro)
ATTRIBUTADJ(membre, partiipant) ATTRIBUT_DE(monnaie, état)
NMOD(état, membre) PREPOBJ(1er janvier 1999, à ompter de)
DEFINITION(euro, monnaie) DATE(1er janvier 1999)
...
Complexquestions ('how', 'why', et.) donotexpet anyshort answer. Onthesekindsof ques-
tions,thesystembehavesmoreasapassageretrievalsystem. Theparagraphsontainingthemore
syntatidependeniesinommonwiththequestionareseleted. Amongthem, thebest-ranked
istheonethatis returnedrstbyLuene. Forexample:
0155-Pourquoi onvient-ilde revoirl'arhiteture duréseauAnimo?
(Why shouldthe strutureof anANIMOnetwork berevised?)
•
SyntatidependeniesandNEtagging:VMOD(onvenir, revoir) DEEPOBJ(revoir, arhiteture)
ATTRIBUT_DE(arhiteture, réseau) NMOD(réseau, animo)
•
Questiontype: omplex (why)•
Expetedtype:∅
Thefollowingpassageisseletedbeauseallthedependeniesofthequestionarefoundinthe
passage:
Passage: onsidérantque, àla suitede diérents travaux eetuésdansle adreommunautaire,
notamment lors d'études et de séminaires, il onvient de revoir l'arhiteture du réseau Animo
an de proéderàla miseen plae d'unsystème vétérinaireintégrant lesdiérentes appliations
informatisées;
(Whereas, as a result of the work arried out at Community level in the ourse of studies and
seminars, the struture of the ANIMO network should be revised so that a veterinary system
integratingthe various omputerappliations anbe introdued;)
DEEPSUBJ(onvenir, il) VMOD(onvenir, revoir)
DEEPOBJ(revoir, arhiteture) ATTRIBUT_DE(arhiteture, réseau)
NMOD(réseau, animo)
PREPOBJ(proéder, afin de) VMOD(proéder, mise)
PREPOBJ(mise, à) NMOD(mise, plae)
...
2.3 Soring
FIDJI'ssoresarenotomposedof asinglevalue,but ofalistof dierentvaluesandags. The
riteriaare listedbelow,andarepresentedindereasingorderofimportane:
•
Aswesaid,aparagraphontaininganextratedshortanswerwill bepreferedifitexists.•
Namedentityvalue(appropriateNEvalueornotonlyforfatoidquestions).•
Keywordrate(between0and1,therateofquestionmajorkeywordspresentinthepassage:propernames,answertypeandnumbers).
•
Answertypevalue(appropriateanswertypeornotonlyforfatoid questions).•
Frequeny weighting (number of extrated ourrenes of this answer only for fatoidquestions).
•
Doumentranking(bestrankofadoumentontainingtheanswer,asreturnedbythesearhWepresenttheresultsTable1bytypesofquestions. Onlyoneanswerperquestionwasallowed,
sothevaluessimplyorrespondto therateoforretanswersforeahquestiontype.
Questiontype Numberofquestions Corret answer
Fatoid 116 36.2%
Denition 101 15.8%
List 37 16.2%
"How" 76 22.4%
"Why" 170 40%
TOTAL 500 30.4%
Table1: FIDJIresultsbyquestiontypes.
Resultsarelowerthanformerampaigns'sores,espeiallyonerning fatoidand denition
questions.
Looking arrefully at the resultsshows that, in these partiular douments, using syntati
dependeniesasthemainluetohooseparagraphandidatesisnotalwaysagoodwaytondout
arelevantpassage. Thisisespeiallytrueforomplexquestions,butnotonly. Indeed,theseletion
oftheparagraphontainingthemostquestiondependeniesoftenleadstotheintrodutionofthe
doumentortoaverygeneralparagraphontainingpoorinformation.
Forexample:
0006-What isthe sope ofthe ounil diretive onthe tradingof fodder seeds?
isansweredby
<p>COUNCILDIRECTIVEof 14June 1966 onthe marketingof fodder plant seed
(66/401/EEC)</p>
ontaining many dependenies but answering nothing, while a good result was later in the
samedoument,butwithananaphora:
<p>ThisDiretive shallapply tofodder plant seedmarketedwithin the Community, irrespe-
tive ofthe usefor whih the seedas grownisintended.</p>
Dependenyrelationsarestillusefultondthegooddoument,butoftenfailstopointoutto
theorretparagraph.
Also, JRC-Aquis orpus uses a dierent register of language than usual orpora suh a Web
ornewspapers. Question aswellasdoumentanalysessuered fromthespeiexpressionsand
struturesusedbyFrenhtexts,andespeiallyfordenitions. Denitions,quiteeasytodetetin
newspaperorpora,havebeenpoorlyreognizedforthisevaluation.
4 Conlusion
We presented in this artileour partiipation to theampaign resPubliQA 2009in Frenh. We
adapted oursyntati-based QA system FIDJI in order to produe a single longanswerin the
formofJRC-Aquistagged paragraphs. Resultsshowedthatsyntatianalysisshouldbeusedin
dierent manners aording to the typeof tasks and questions. A arefullook at our system's
[1℄ SalahAït-Mokhtar andJean-PierreChanod. Inremental nite-stateparsing. InProeedings
of the fthonfereneon Applied naturallanguage proessing, pages7279,Washington,DC,
USA,1997.MorganKaufmannPublishersIn.,SanFraniso,California,USA.
[2℄ ErikHatherandOtisGospodneti¢. Luene inAtion. Manning,2004.
[3℄ VéroniqueMorieauandXavierTannier. Étudedel'apportdelasyntaxedansunsystèmede
question-réponse. InAtesde la Conférene TraitementAutomatiquedes Langues Naturelles
(TALN2009,poster),Senlis,Frane,jun 2009.
[4℄ VéroniqueMorieau,XavierTannier,andBrigitteGrau.Utilisationdelasyntaxepourvalider
les réponses àdes questions par plusieursdouments. In Proeedings of workshop on COn-
féreneenReherhed'InformationetAppliations,CORIA,Presqu'îledeGiens,Frane,2009.