• Aucun résultat trouvé

The LIMSI Participation in the QAst Track

N/A
N/A
Protected

Academic year: 2022

Partager "The LIMSI Participation in the QAst Track"

Copied!
9
0
0

Texte intégral

(1)

SophieRosset,OlivierGalibert,GillesAdda,EriBilinski

SpokenLanguageProessingGroup,LIMSI-CNRS,B.P.133,91403Orsayedex,Frane

{firstname.lastname}limsi.fr

Abstrat

Inthispaper,wepresenttwetwodierentquestion-answeringsystemsonspeehtran-

sriptswhih partiipatedtotheQAst2007evaluation. These twosystemsare based

on aomplete andmulti-levelanalysis ofbothqueriesanddouments. Therstsys-

tem uses handrafted rules for small text fragments (snippet) seletion and answer

extration. Theseondonereplaesthehandraftingwithanautomatiallygenerated

researh desriptor. A sore based on those desriptors is used to selet douments

and snippets. Theextrationand soringofandidateanswersisbasedonproximity

measurementswithintheresearhdesriptorelementsandanumberofseondaryfa-

tors. The evaluation results are ranged from 17%to 39% asauray depending on

thetasks.

Categories and Subjet Desriptors

H.3[InformationStorageandRetrieval℄: H.3.1ContentAnalysisandIndexing;H.3.3Information

SearhandRetrieval;H.3.4SystemsandSoftware

General Terms

Measurement,Performane,Experimentation

Keywords

Questionanswering,speehtransriptionsofmeetingand letures

1 Introdution

In the QA and Information Retrieval domains progress has been demonstrated via evaluation

ampaigns for both open domain and limited domains [1, 2, 3℄. In these evaluations systems

arepresentedwithindependentquestions andshould provideoneanswerextrated fromtextual

data toeah question. Reently, there hasbeengrowinginterestin extrating informationfrom

multimediadatasuhasmeetings,letures... Spokendataisdierentfromtextualdatainvarious

ways. Thegrammatialstrutureofspontaneousspeeh isquitedierentfrom writtendisourse

and inlude various typesof disuenies. Theleture and interative meeting data provided in

QAstevaluation arepartiularlydiultdueto run-onsentenesandinterruptions. Mostofthe

QAsystemsuseaomplete andheavysyntatiandsemantianalysisof both thequestionand

the doument or snippets given by searh engine in whih the answer has to be found. Suh

analysisan'treliablybeperformedonthedataweareinterestedin. TypialtextualQAsystems

areomposedofquestionanalysis,informationretrievalandanswerextrationomponents[1,4℄.

Theanswerextrationomponentisquiteomplexandinvolvesnaturallanguageanalysis,pattern

mathing andsometimeseven logialinferene [5℄. Mostof thesenaturallanguagetools arenot

designedtohandlespokenphenomena.

(2)

evaluation. Our QA systemsare part of an interative and bilingual(English and Frenh) QA

systemalledRitel[6℄whihspeiallyaddressedspeedissues. Thefollowingsetionspresentthe

doumentsandqueriespre-proessingandthenon-ontextualanalysiswhihareommontoboth

systems. Thesetion3desribestheoldersystem(System1). Setion 4presentsthenewsystem

(System2). Setion5nallypresentstheresultsforthesetwosystemsonbothdevelopmentand

testdata.

2 Analysis of douments and queries

Usually, the syntati/semanti analysis is dierent for the doument and for the query; our

approahistoperformthesameompleteandmultilevelanalysisonbothqueriesanddouments.

Thereareseveralreasonsforthis. Firstofall,thesystemhastodealwithbothtransribedspeeh

(transriptionsofmeetingsandletures,userutteranes)andtextdouments,sothereshouldbea

ommonanalysisthattakesinto aountthespeiitiesofbothdatatypes. Moreover,inorret

analysis due to the lak of ontext or limitations of hand-oded rules are likely to happen on

bothdatatypes,sousingthesamestrategyfordoumentandutteraneanalysishelps toredue

theirnegativeimpat. Inorder touse thesameanalysismodulefor allkindsofdata, weshould

transformthequeryand thedouments,whih mayome from dierentmodality(text, manual

transripts, automati transripts) in order to have a ommon representation of the sentene,

word,et. Thisproessisthenormalization.

2.1 Normalization

Normalization,inourappliation,istheproessbywhihrawtextsareonvertedto atextform

where words and numbers are unambiguously delimited, puntuation is separatedfrom words,

and thetext is split into sentene-likesegments(or aslose to sentenesasis reasonablypossi-

ble). Dierentnormalizationstepsareapplied,dependingofthekindofinputdata;thesestepsare:

1. Separatingwordsandnumbersfrom puntuation.

2. Reonstruting orretaseforthewords.

3. Addingpuntuation.

4. Splitting intosentenesat periodmarks.

IntheQAstevaluation,fourdatatypesareof interest:

CHILletures[7℄withmanualtransriptions,wheremanualpuntuationsareseparatedfrom

words. Onlythesplittingstepisneeded.

CHILletureswithautomatitransriptions[8℄. Requiresaddingpuntuationandsplitting.

AMI meetings [9℄manualtransriptions. Thetransriptionshad beentextied, withpun-

tuation joined to thewords,rst wordssentenes upper-ased, et. Requiresallthe steps

exeptaddingpuntuation.

AMI meetings with automati transriptions [10℄. Laking ase, they required the last 3

steps.

Reonstrutingtheaseandaddingpuntuationisdoneinthesameproessbasedonusingafully-

ased,puntuatedlanguagemodel[11℄. Awordgraphwasbuiltoveringallthepossiblevariants

(allpossiblepuntuationsaddedbetweenwords,allpossiblewordases),anda4-gramlanguage

model wasused to selet the mostprobable hypothesis. Thelanguage model was estimated on

House of Commons Daily Debates, nal edition of the European Parliament Proeedings and

various newspapers arhives. The nal result, with upperase onlyon propernouns and words

(3)

The analysis is onsidered non-ontextual beause eah sentene is proessed in isolation. The

generalobjetiveof thisanalysis is to ndthebits of informationthat may be ofuse forsearh

andextration, whihweallpertinent informationhunks. Theseanbeofdierentategories:

named entities,linguisti entities (e.g. verbs,prepositions), orspei entities (e.g. sores). All

words that do not fall into suh hunks are automatially grouped into hunks via a longest-

math strategy. Some examples of pertinent information hunks are given in Figure 1. In the

followingsetions,thetypesofentitieshandledbythesystemaredesribed,alongwithhowthey

arereognized.

_prepin _orgNIST _NNmetadataevaluations _verbreported _NNspeakertraking

_soreerrorrates _auxare _prepabout _val_sore15%

Figure 1: ExamplesofpertinentinformationhunksfromtheCHILdataolletion

2.2.1 DenitionofEntities

Followingommonlyadopteddenitions,thenamedentitiesareexpressionsthatdenoteloations,

people, ompanies, times, and monetary amounts. These entities have ommonly known and

aeptednames. ForexampleiftheountryFraneisanamedentity,apitalofFrane isnota

namedentity. Howeverourexperieneisthattheinformationpresentinthenamedentitiesisnot

suientto analyzethewiderange ofuserutteranes that anbefoundin leturesormeetings

transripts. Thereforewedenedasetofspeientitiesinordertoolletallobservedinformation

expressions ontained in a orpus questions and texts from a variety of soures (proeedings,

transriptsofletures,dialogset.). Figure2summarizesthedierententitytypesthat areused.

Type of entities Examples

lassial pers: RomanoProdi ;WinstonChurhill

namedentities prod: PulpFition;Titani

time: third entury;1998;June30th

org: EuropeanCommission; NATO

lo: Cambridge; England

extended method: HMM,Gaussianmixturemodel

namedentities event: the9thonfereneonspeehommuniationandtehnology

amount: 500; twohundredandftythousand

measure: year;mile ;Hertz

olorred,springgreen

question markers Qpers: whowrote... ;whodiretedTitani

Qlo: where isIBM

Qmeasure: whatistheweightofthebluespoonheadset

linguistihunk ompound: languageproessing;informationtehnology

verb: RobertoMartineznowknows thefullsizeofthetask

adj_omp: themirophoneswouldbesimilarto ...

adj_sup: thebiggestproduerofooaoftheworld

Figure2: Examplesofthemainentitytypes

2.2.2 Automatidetetion oftyped entities

The types weneed to detetorrespondto twolevels ofanalysis: named-entityreognitionand

(4)

overage of all dened types and subtypes indued the need of a large number of ourrenes,

andthereforerelyontheavailabilityoflargeannotatedorporawhiharediulttobuild. Rule-

basedapproahestonamed-entityreognition(e.g. [15℄)relyonmorphosyntatiand/orsyntati

analysis ofthe douments. However,in thepresent work,performingthissort of analysisis not

feasible: thespeeh transriptionsare toonoisyto allowforbothaurate androbustlinguisti

analysis basedon typialrulesandtheproessingtimeof mostofexistinglinguisti analyzersis

notompatiblewiththehighspeedwerequire.

We deided to takle the problem with rulesbased on regularexpressions on words asin other

works [16℄: weallowtheuseoflists forinitial detetion,andthedenition ofloal ontextsand

simpleategorizations.Thetoolusedtoimplementtherule-basedautomatiannotationsystemis

alledWmath. Thisenginemathes(andsubstitutes)regularexpressionsusingwordsasthebase

unitinsteadofharaters. Thispropertyallowsforamorereadablesyntaxthantraditionalregular

expressionsand enablestheuse oflasses(listsof words)and maros(sub-expressionsin-linein

a larger expression). Wmath inludes also NLP-oriented features like strategies for prioritizing

rule appliation, reursive substitution modes, word tagging (for tags like noun, verb...), word

ategories(number,aronym,propername...). Ithasmultipleinputandoutputformats,inluding

anXML-basedoneforinteroperabilityandtoallowhainingofinstanesofthetoolwithdierent

rulesets. Rulesarepre-analyzedandoptimizedinseveralways,andstoredin ompatformatin

order to speedupthe proess. Analysis is multi-pass, and subsequentrule appliations operate

ontheresultsof previousruleappliations whih anbe enrihedormodied. The fullanalysis

omprisessome50stepsandtakesroughly4msonatypialuserutterane(ordoumentsentene).

Theanalysisprovides96dierenttypesofentities. Figure3showsanexampleoftheanalysison

aquery(top)andonatransription(bottom).

<

_Qorg

>

whihorganization

<

/_Qorg

> <

_ation

>

provided

<

/_ation

>

<

_det

>

a

<

/_det

> <

_NN

>

signiantamount

<

/_NN

>

<

_prep

>

of

<

/_prep

> <

_NN

>

trainingdata

<

/_NN

> <

_punt

>

?

<

/_punt

>

<

_pro

>

it

<

/_pro

> <

_verb

>

's

<

/_verb

> <

_adv

>

just

<

/_adv

>

<

_prep_omp

>

sort of

<

/_prep_omp

> <

_det

>

a

<

/_det

>

<

_NN

>

verypale

<

/_NN

> <

_olor

>

blue

<

/_olor

> <

_onj

>

and

<

/_onj

>

<

_det

>

a

<

/_det

> <

_adj

>

light-up

<

/_adj

> <

_olor

>

yellow

<

/_olor

>

<

_punt

>

.

<

/_punt

>

Figure 3: Example annotation of a query: whih organization provided a signiant amount of

trainingdata? (top)andofatransriptionit's justsortof averypale blue(bottom).

3 Question-Answering System 1

The Question-Answering system handles searh in douments of any types (news artiles, web

douments,transribedbroadastnews,et.). Forspeedreasons,thedoumentsareallavailable

loally and preproessed: they are rst normalized, and then analyzed with the NCA module.

The(type,values)pairsarethenmanagedbyaspeializedindexerforquiksearhandretrieval.

Thissomewhatbag-of-typed-wordssystem[6℄worksinthreesteps:

1. Doument query lists reation. Using theentities found in the question, we generate

a doument query, and a ordered list of handrafted bak-o queries. These queries are

obtainedbyrelaxingsomeoftheonstraintsonthepreseneoftheentities,usingarelative

importaneordering(Namedentity

>

NN

>

adj_omp

>

ation

>

subs...)

(5)

and stop as soon as we get doument snippets (sentene or small groups of onseutive

sentenes)bak.

3. Answer extration and seletion: thedetetion oftheanswertypehasbeenextrated

beforehandfrom thequestion, using QuestionMarker,Named, Non-spei andExtended

Entities o-ourrenes(_Qwho

_persor_pers_defor_org). Therefore,weseletthe entities in thesnippets with theexpeted typeof the answer. At last,a lustering of the

andidate answersis done, basedonfrequenies. Themostfrequent answerwins, and the

distributionoftheountsgivesanideaoftheondeneofthesystemintheanswer.

4 Question-Answering System 2

System1hasthreemain problems:

Thebak-oquerieslists requirealargeamountof maintenanework andwill neverover

alloftheombinationsofentities whihmaybefoundin thequestions.

Theanswerseletionusesonlyfrequeniesofourrene,oftenendingupwithlistsofrst-

rankandidateanswerswiththesamesore.

The systemansweringspeed diretlydepends onthenumberof snippets to retrievewhih

maysometimes beverylarge. Tolimitthenumberofsnippets is noteasy, asthey arenot

rankedaordingtopertinene.

Anewsystem,System2hasbeendesignedtosolvetheseproblems. Wehavekeptthethreesteps

desribed in setion 3, with somemajor hanges. In step 1, insteadof instantiating doument

queriesfromalargenumberofpreexistinghandraftedrules(about5000),wegeneratearesearh

desriptor using a very small set of rules (about 10); this desriptor ontains all the needed

informationabouttheentities andtheanswertypes,togetherwith weights. Instep2,asoreis

alulatedfromtheproximitybetweentheresearhdesriptorandthedoumentandsnippets,in

ordertohoosethemostrelevantones.Instep3,theanswerisseletedaordingtoasorewhih

takesinto aountmanydierentfeaturesandtuningparameters,whihallowanautomatiand

eientadaptation.

4.1 Researh Desriptor generation

Therststepof System2isto buildaresearh desriptor(datadesriptorreord, DDR) whih

ontains theimportant elements of thequestion, and the possible answertypeswith assoiated

weight. Someelementsaremarkedasritial,whihmakesthemmandatoryinfuturesteps,while

othersareseondary. Theelementextrationandweightingisbasedonaempiriallassiation

of the element types in importane levels. Answertypes are predited through rules based on

ombinationsofelementsofthequestion. TheFigure4showsanexampleofaDDR.

4.2 Douments and snippets seletion and soring

EahofthedoumentissoredwithgeometrimeanofthenumberofourrenesofalltheDDR

elements whih appear in it. Using a geometri mean preventsfrom resaling problems due to

someelementsbeingnaturallymorefrequent. Thedoumentsaresortedbysoreandthen-best

onesare kept. Thespeedoftheentiresystemanbeontrolledbyhoosingn,thewhole system

beingin pratieio-boundratherthanpu-bound.

The seleted douments are then loaded and all the lines in a predened window (2-10 lines

dependingonquestiontypes)fromtheritialelementsarekept,reatingsnippets. Eahsnippet

issoredusingthegeometrialmeanofthenumberofourrenesofalltheDDRelementswhih

(6)

question: in whih ompany Bart works as a projet manager ?

ddr:

{ w=1, ritial, pers, Bart},

{ w=1, ritial, NN, projet manager },

{ w=1, seondary, ation, works },

answer_type = {

{ w=1.0, type=orgof },

{ w=1.0, type=organisation },

{ w=0.3, type=lo },

{ w=0.1, type=aronym },

{ w=0.1, type=np },

}

Figure 4: Exampleof aDDR onstrutedfrom the questionin whih ompany Bart works as a

projetmanager;eahelementontainsaweightw,theirimportaneforfuturesteps,andthepair

(type,value);eahpossibleanswertypeontainsaweightwandthetypeoftheanswer.

4.3 Answer extration, soring and lustering

In eah snippet all the elements whih type is one of the predited possible answer types are

andidateanswers. WeassoiatetoeahandidateanswerA asore

S(A)

:

S(A) = [w(A) P

E max e=E w(E)

(1+d(e,A)) α ] 1 −γ × S γ snip

C d (A) β C s (A) δ

(1)

Inwhih:

• d(e, A)

isthedistanetoeahelement

e

of thesnippet,instantiatingasearhelement

E

of

theDDR

• C s

is thenumberofourrenesofA intheextrated snippets,

C d

in thewhole doument

olletion

• S snip

istheextratedsnippet sore(see4.2)

• w(A)

istheweightoftheanswertypeand

w(E)

theweightoftheelement

E

in theDDR

• α

,

β

,

γ

and

δ

aretuningparametersestimatedbysystematitrialsonthedevelopmentdata.

α, β, γ ∈ [0, 1]

and

δ ∈ [ − 1, 1]

Anintuitiveexplanationoftheformulaisthat eahelementoftheDDRaddsto thesoreofthe

andidate(

P

E

)proportionallytoitsweight(

w(E)

)andinverselyproportionallytoitsdistaneof theandidate(

d(e, A)

). Ifmultipleinstaneof theelementarefoundinthesnippet onlythebest

oneiskept(

max e=E

). Thesoreisthensmoothedwiththesnippetsore(

S snip

)andompensated

inpartwiththeandidatefrequenyin allthedouments(

C d

)andin thesnippets(

C s

).

Thesoresforidential(type,value)pairsareaddedtogetherandgivethenalsoringforallthe

possibleandidateanswers.

5 Evaluation

Inthissetion,wepresenttheresultsobtainedinthefourtasks. T1andT2taskswereomposed

ofanidentialsetof98questions;T3taskwasomposedofadierentsetof96questionsandT4

taskofasubsetof93questions. Table1showtheoverallresultswiththe3measuresusedinthis

evaluation. Wesubmittedtworuns,oneforeahsystem,for eah ofthefour tasks. As required

bytheevaluationproedure,amaximumof5answersperquestionwasprovided.

Globally,weanseethatSystem2getsbetterresultsthanSystem1. Theimprovementofthe

(7)

T1 Sys1 32.6% 0.37 43.8%

Sys2 39.7% 0.46 57.1%

T2 Sys1 20.4% 0.23 28.5%

Sys2 21.4% 0.24 28.5%

T3 Sys1 26.0% 0.28 32.2%

Sys2 26.0% 0.31 41.6%

T4 Sys1 18.3% 0.19 22.6%

Sys2 17.2% 0.19 22.6%

Table1: GeneralResults. Sys1System1;Sys2System2;A. istheauray,MRRistheMean

ReiproalRankandReallthetotalnumberoforretanswersinthe5returnedanswers

of doument/snippet queries greatly improves the overage as ompared to handrafted rules.

System 2 did notperform betterthan System 1on the T2 task. Further analysis is needed to

understandwhy.

Thedierentmodules weanevaluatearethe analysismodule, thepassageretrievaland the

answerextration. ThepassageretrievaliseasiertoevaluateforSystem2beauseitisaomplete

separate module, whih is not the ase in the System 1. The Table 2 give the results on the

passageretrievalintwoonditions: withalimitationofthenumberofpassagesat 5andwithout

limitation. The diferene between the Reall on the snippets (how often the answeris present

in the seleted snippets) and theQA Aurayshow that the extration and thesoring of the

answerhasareasonnablemarginforimprovement. ThedierenebetweenthesnippetRealland

itsAuray(from 26to 38%for thenolimitondition) illustrates thatthe snippet soringan

beimproved.

Passagelimit=5 Passagewithoutlimit

Task A. MRR Reall A. MRR Reall

T1 44.9% 0.52 67.3% 44.9% 0.53 71.4%

T2 29.6% 0.36 46.9% 29.6% 0.37 57.0%

T3 30.2% 0.37 47.9% 30.2% 0.38 68.8%

T4 18.3% 0.22 31.2% 18.3% 0.24 51.6%

Table2: ResultsforPassageRetrievalforSystem2. Passage 5themaximumofpassagenumber

is5;Passage withoutlimitthereisnolimitforthepassagenumber;A. istheauray,MRRis

theMeanReiproalRankandReallthetotalnumberoforretanswersinthereturnedanswers

Oneofthekeyusesoftheanalysisresultsisroutingthequestionwhihisdeterminingarough

lassforthetypeoftheanswer(language, loation,...). Theresultsoftheroutingomponentare

given in Table3 withdetails byanswerategory. Two questions of T1/T2and three of T3/T4

were notrouted.

Weobservedlargediereneswiththe resultsobtainedonthedevelopmentdata, in partiu-

larlywith themethod,olor and timeategories. Theanalysis module hasbeenbuilt onorpus

observations and it seemsto betoodependanton thedevelopmentdata. That anexplain the

absene ofmajor dierenesbetween System1 and System2for the T1/T2tasks. Most of the

wrongly routed questions havebeen routed to the generi answer type lass. In System 1this

lassseletsspei entities (method, models,system, language...) overtheotherentitytypesfor

thepossibleanswers. InSystem2nosuh adaptationto thetaskhasbeendoneandallpossible

entitytypeshaveequalpriority.

(8)

T1/T2

%Corret 72% 100% 89% 75% 17% 95% 89%

#Questions 98 4 9 28 18 20 9

T3/T4

%Corret 80% 100% 93% 83% - 85% 80%

#Questions 96 2 14 12 - 13 15

TIM SHA COL MAT

T1/T2

%Corret 80% - - -

#Questions 10 - - -

T3/T4

%Corret 71% 89% 73% 50%

#Questions 14 9 11 6

Table3: Routingevaluation. All: allquestions;LAN: language;LOC: loation;MEA: measure;

MET:method/system;ORG:organization;PER:person;TIM:time;SHAP:shape;COL:olour.

6 Conlusion and future work

WepresentedtheQuestionAnsweringsystemsusedforourpartiipationtotheQAstevaluation.

Two dierent systems have been used for this partiipation. The two main hanges between

System1andSystem2arethereplaementofthelargesetofhandmaderulesbytheautomati

generationofaresearhdesriptor,andtheadditionofaneientsoringoftheandidateanswers.

Theresultsshowthat theSystem2outperformstheSystem1. Themain reasonsare:

1. Better generiity through theuse of akind of expert systemto generate the researh de-

sriptors.

2. Morepertinentanswersoringusingproximitieswhihallowsasmoothingoftheresults.

3. Preseneofvarioustuningparameterswhihenabletheadaptionofthesystemtothevarious

questionanddoumenttypes.

These systems have been evaluated on dierent data orresponding to dierent tasks. On

themanually transribedletures, the best result is39%for Auray, onmanually transribed

meetings,24%for Auray. There wasno spei eortdoneon theautomatially transribed

leturesandmeetings,sotheperformanesonlygiveanideaofwhatanbedonewithouttryingto

handlespeehreognitionerrors. Thebestresultis18.3%onmeetingand21.3%onletures. From

the analysis presentedin theprevious setion, performane anbeimprovedat everystep. For

example,theanalysisandroutingomponentanbeimprovedinordertobettertakeintoaount

sometypeofquestionswhihshouldimprovetheanswertypingandextration. Thesoringofthe

snippets andtheandidate answersan alsobeimproved. Inpartiularsometuningparameters

(liketheweightofthetransformationsgeneratedintheDDR) havenotbeenoptimizedyet.

7 Aknowledgments

This work waspartially funded by theEuropeanCommission under theFP6 IntegratedProjet

IP 506909Chiland theLIMSIAI/ASPRitelgrant.

Referenes

[1℄ E.M.Voorhees,L.P.Bukland.TheFifteenthTextREtrievalConfereneProeedings(TREC

2006),InVoorheesandBuklandeds.2006.

(9)

Saaleanu,P.Roha,R.Sutlie.OverviewoftheCLEF2006MultilingualQuestionAnswering

Trak.Working NotesfortheCLEF2006Workshop.2006.

[3℄ C.Ayahe,B.Grau,A.Vilnat.Evaluationofquestion-answeringsystems: TheFrenhEQueR-

EVALDA EvaluationCampaign.Proeedings ofLREC'06,Genoa,Italy.

[4℄ S. Harabagiuand D. Moldovan. Question-Answering.In The Oxford Handbook of Computa-

tional Linguistis.R.Mitkov(Eds). OxfordUniversityPress.2003.

[5℄ S. Harabagiu, A. Hikl. Methods for using textual entailment in Open-Domain question-

answering.ProeedingsofCOLING'06.Sydney,Australia.July2006.

[6℄ B.vanShooten,S.Rosset,O.Galibert,A.Max,R.opdenAkker,G.Illouz.Handlingspeeh

inputintheRitelQAdialoguesystem.2007.ProeedingsofInterspeeh'07.Antwerp.Belgium.

August2007.

[7℄ CHILProjet.http://hil.server.de

[8℄ L. Lamel, G. Adda, E. Bilinski, and J.-L. Gauvain. TransribingLetures and Seminars. In

InterSpeeh, Lisbon,September2005.

[9℄ AMI projet.http://www.amiprojet.org

[10℄ T. Hain, L. Burget, J. Dines, G. Garau, M. Karaat, M. Linoln, J. Vepa, and V. Wan.

TheAMIMeetingTransription System: ProgressandPerformane.RihTransription 2006

Spring (RT06s)MeetingReognitionEvaluation.3May2006,Bethesda,Maryland,USA.

[11℄ D.Déhelotte,H.Shwenk,G.Adda,J.-L.Gauvain.ImprovedMahineTranslationofSpeeh-

to-Textoutputs.2007.ProeedingsofInterspeeh'07.Antwerp.Belgium.August2007.

[12℄ D.M. Bikel, S. Miller, R. Shwartz, R. Weishedel. Nymble: a high-performane learning

name-nder.ProeedingsofANLP'97,Washington,USA,1997.

[13℄ H. Isozaki, H. Kazawa, Eient Support Vetor Classiers for Named Entity Reognition.

ProeedingsofCOLING,Taipei.2002.

[14℄ M. Surdeanu, J. Turmo, E. Comelles. Named Entity Reognition from spontaneous Open-

Domain Speeh.ProeedingsofInterSpeeh'05,Lisbon,Portugal.2005.

[15℄ F.Wolinski,F.Vihot,B.Dillet.AutomatiProessingofProperNamesinTexts.Proeedings

ofEACL'95,Dublin,Ireland. 1995.

[16℄ S.Sekine.Denition,ditionariesandtaggerofExtendedNamedEntityhierarhy.Proeed-

ingsofLREC'04,Lisbon,Portugal.2004.

Références

Documents relatifs

Thus, internal nodes of the knowledge base tree are marked with syntagmatic patterns as a result of this algorithm.. The search for the answer to the question in

The use of syntagmatic patterns to organize the structure of the knowledge base of the question- answer system allows one to effectively search for answers to questions.... A

Does she want to register for a scientific stream.. No,

Charles Stuart, King of England … Traitorously waged a war against Parliament and the people.. He renewed the War against the Parliament

I used to live with my grandparents there but I came to England three years ago to live with my mother to study.. My father died four years ago in

In this stage the EAF acts as an OAuth2 provider, providing a page to which the third-party applica- tion can redirect the user to be prompted for authorization of their identity

Each T-H pair identified for each answer option corresponding to a question is now assigned a score based on the NER module, Textual Entailment module,

As we can see, there is an huge loss between the QAst08 corpus and the test and development corpus of QAst09. One reason for these results could be the new methodology used to build