BarbelRipplinger
IAI
Martin-Luther-Str. 14
66111 Saarbrucken, Germany
Abstract
Theobjectiveofthisyear'sclefparticipationwastoevaluateanimprovedGermancom-
ponent,focusingontheimpactdecompositioninformationhasonperformance.
1 Introduction
TheMpro-IRsystemisaclirsystembasedonquerytranslationandfocusesratheronabetter
recall thanon abalanced recall and precisiongure. Toimprovethe recall, the systemtries to
takeadvantageof asophisticatedlinguistic processing componentwhose results are usedin the
monolingual retrievalmodules. Based on theoutput of a morpho-syntacticanalysis which pro-
videsthefull rangeof morphologicalinformation, notonlyinectionwhich would correspondto
thepowerofaPorter-likestemmerbutalsoderviationanddecompositionofcompoundnounsare
exploited. Thisinformationisusedforindexing, queryexpansion,searchanddocumentranking.
The objectiveofthis year's clefparticipationwasto evaluateanimprovedGermancompo-
nent,thereforeonlyoneoÆcial monoligualGermanrunhad beensubmitted. Theinvestigations
focusedonhowanimprovedlinguisticprocessingaectstheperformancecomparedtolastyear's
result. Anewmorpho-syntacticanalysisforGermanhasbeenappliedwhichusesafarbettertag-
gingandlemmatisationcomponentbasedonamorphemelexiconwith85.500entries(morphemes,
stemsaswellaswordforms)comparedto42.000entries usedforlast yearsexperiments.
2 Experiment Settings
Asfoundoutlastyear,thenumberofdocumentsretrievedbyMpro-IRhadbeenlowcomparedto
othersystems. Becausethiswasmainlyduetotherestrictiontoonesentenceassearchwindow,we
didnotapply thislimitationinthisyear'sexperiment. Anotherreasonwasthatwewereobliged
to submitarunusing title anddescriptionasqueryinput(ndingallmeaningbearingwordsof
thedescriptionin onesentencewouldmakenosense).
EventheunderlyingarchitectureofMpro-IRrequiresthateachwordhastooccurinadocu-
menttoberelevant,noquerypreprocessingwasdone,i.e. xed phrasessuchas'nd documents
about', 'nd reports on', etc. were not deleted. However, because these phrases hardly occur
in a document, we took accountby weaken the requirement above,i.e. not everyqueried term
hasto occurin adocumentto be relevant. Inconsequence,thecalculationoftherankhasbeen
changed from last year bycalculating theweightnot onlyon basisof the linguistic information
usedto retrieveaparticulardocumentbut consideringadditionallythenumberofqueriedterms
found within this document. The querywasmorpho-syntactically analysed, using the informa-
tionextractedformeaningbearingwords(thosehavingaspart-of-speechnoun,verb,oradjective)
suchaslexicalbaseform,derivationalroot,anddecompositiontosearchforinGermandocument.
WorkingnowforEurospiderInformationTechnologyAG,Emailaddress:[email protected].
Theoverall resultof ourrunshowsalowerretrievalperformance ofMpro-IR compared to the
othersystems. Inspiteofahighernumberofdocumentsretrieved,theresultisevenworserthan
lastyear. Onereasoniscertainlythatcorruptedlexcionswereusedfordocumentandqueryanal-
ysis (unfortunatelythere wasnotime to redo the corpusanalysis). Insofar, the resultshave no
signicancetoouraimexpectingthatanimprovedlinguisticanalysispositivelyaectstheperfor-
mance. Furthermore,thereisreasontosupposethattheworseresultsareduetothedecisionnot
to preprocessthequeries, andinsteadchangingthesearch algorithmplus theranking,and thus
undermineMpro-IR'sphilosophy.
Theinvestigationoftheresultsperqueryin moredetailshowsmoreorlessthesamendings
as last year: Most hits could be retrieved by using precise lexcial base forms, and derivational
information. CompositionalinformationwasalsovaluabletodetectsyntacticvariantsofGerman
compounds. However,becausethelexicon hasnow50%moreentries,thenumberofwrongcom-
poundanalyseshasincreased which ismainly dueto thecurrentstateof themorphemelexicon.
Notallentriesareexaminedinrespecttoallowedandforbiddencompounding,informationwhich
hastobeexplicitlyencoded.
Acknowledgements
I'm indebted to Peter Schauble for allowingme to use eurospider resourcesto carry outthis
experiment.
References
[1] Maas, D. Multilinguale Textverarbeitung mit MPRO.InG.Lobinetal.(eds): Europaische Kom-
munikationskybernetik heute und morgen, KoPad, Munchen, 1999. http://www.iai.uni-
sb.de/global/memos.html
[2] Ripplinger,B.Mpro-IR{ACross-languageInformationRetrievalComponentEnhancedbyLinguistic
Knowledge. InProceedings ofriao2000, Paris,2000.