• Aucun résultat trouvé

Mpro-IR in Clef 2001

N/A
N/A
Protected

Academic year: 2022

Partager "Mpro-IR in Clef 2001"

Copied!
2
0
0

Texte intégral

(1)

BarbelRipplinger

IAI

Martin-Luther-Str. 14

66111 Saarbrucken, Germany

[email protected]

Abstract

Theobjectiveofthisyear'sclefparticipationwastoevaluateanimprovedGermancom-

ponent,focusingontheimpactdecompositioninformationhasonperformance.

1 Introduction

TheMpro-IRsystemisaclirsystembasedonquerytranslationandfocusesratheronabetter

recall thanon abalanced recall and precisiongure. Toimprovethe recall, the systemtries to

takeadvantageof asophisticatedlinguistic processing componentwhose results are usedin the

monolingual retrievalmodules. Based on theoutput of a morpho-syntacticanalysis which pro-

videsthefull rangeof morphologicalinformation, notonlyinectionwhich would correspondto

thepowerofaPorter-likestemmerbutalsoderviationanddecompositionofcompoundnounsare

exploited. Thisinformationisusedforindexing, queryexpansion,searchanddocumentranking.

The objectiveofthis year's clefparticipationwasto evaluateanimprovedGermancompo-

nent,thereforeonlyoneoÆcial monoligualGermanrunhad beensubmitted. Theinvestigations

focusedonhowanimprovedlinguisticprocessingaectstheperformancecomparedtolastyear's

result. Anewmorpho-syntacticanalysisforGermanhasbeenappliedwhichusesafarbettertag-

gingandlemmatisationcomponentbasedonamorphemelexiconwith85.500entries(morphemes,

stemsaswellaswordforms)comparedto42.000entries usedforlast yearsexperiments.

2 Experiment Settings

Asfoundoutlastyear,thenumberofdocumentsretrievedbyMpro-IRhadbeenlowcomparedto

othersystems. Becausethiswasmainlyduetotherestrictiontoonesentenceassearchwindow,we

didnotapply thislimitationinthisyear'sexperiment. Anotherreasonwasthatwewereobliged

to submitarunusing title anddescriptionasqueryinput(ndingallmeaningbearingwordsof

thedescriptionin onesentencewouldmakenosense).

EventheunderlyingarchitectureofMpro-IRrequiresthateachwordhastooccurinadocu-

menttoberelevant,noquerypreprocessingwasdone,i.e. xed phrasessuchas'nd documents

about', 'nd reports on', etc. were not deleted. However, because these phrases hardly occur

in a document, we took accountby weaken the requirement above,i.e. not everyqueried term

hasto occurin adocumentto be relevant. Inconsequence,thecalculationoftherankhasbeen

changed from last year bycalculating theweightnot onlyon basisof the linguistic information

usedto retrieveaparticulardocumentbut consideringadditionallythenumberofqueriedterms

found within this document. The querywasmorpho-syntactically analysed, using the informa-

tionextractedformeaningbearingwords(thosehavingaspart-of-speechnoun,verb,oradjective)

suchaslexicalbaseform,derivationalroot,anddecompositiontosearchforinGermandocument.

WorkingnowforEurospiderInformationTechnologyAG,Emailaddress:[email protected].

(2)

Theoverall resultof ourrunshowsalowerretrievalperformance ofMpro-IR compared to the

othersystems. Inspiteofahighernumberofdocumentsretrieved,theresultisevenworserthan

lastyear. Onereasoniscertainlythatcorruptedlexcionswereusedfordocumentandqueryanal-

ysis (unfortunatelythere wasnotime to redo the corpusanalysis). Insofar, the resultshave no

signicancetoouraimexpectingthatanimprovedlinguisticanalysispositivelyaectstheperfor-

mance. Furthermore,thereisreasontosupposethattheworseresultsareduetothedecisionnot

to preprocessthequeries, andinsteadchangingthesearch algorithmplus theranking,and thus

undermineMpro-IR'sphilosophy.

Theinvestigationoftheresultsperqueryin moredetailshowsmoreorlessthesamendings

as last year: Most hits could be retrieved by using precise lexcial base forms, and derivational

information. CompositionalinformationwasalsovaluabletodetectsyntacticvariantsofGerman

compounds. However,becausethelexicon hasnow50%moreentries,thenumberofwrongcom-

poundanalyseshasincreased which ismainly dueto thecurrentstateof themorphemelexicon.

Notallentriesareexaminedinrespecttoallowedandforbiddencompounding,informationwhich

hastobeexplicitlyencoded.

Acknowledgements

I'm indebted to Peter Schauble for allowingme to use eurospider resourcesto carry outthis

experiment.

References

[1] Maas, D. Multilinguale Textverarbeitung mit MPRO.InG.Lobinetal.(eds): Europaische Kom-

munikationskybernetik heute und morgen, KoPad, Munchen, 1999. http://www.iai.uni-

sb.de/global/memos.html

[2] Ripplinger,B.Mpro-IR{ACross-languageInformationRetrievalComponentEnhancedbyLinguistic

Knowledge. InProceedings ofriao2000, Paris,2000.

Références

Documents relatifs

This document lists the various Voice Profile for Internet Mail (VPIM) email address formats that are currently in common use and defines several new address formats for

level), when an adjacency reaches the "UP" state, the starting router starts a timer T1 and transmits an IIH containing the restart TLV with the RR bit clear and SA

For the multilingual ad-hoc document retrieval track (TEL@CLEF) at at the Cross-Language Retrieval Forum (CLEF) Trinity College Dublin and Dublin City University participated

Our first objective in participating in this domain-specific evaluation campaign is to propose and evaluate various indexing and search strategies for the German, English

In this year, since we want to analysis if query expansion using web resource can bring different information with relevant feedback, we redo the media mapping method and use it to

Through a series of experiments with the 38 training topics and the 25 test topics, we were able to show that a combination of document expansion using a side collection and

[r]

However, the proposed translation and disambiguation method showed the best result in terms of average precision, comparing to the query expansion methods: via a relevance