Comparing Weighting Models for Monolingual Information Retrieval

(1)

information retrieval

GianniAmati 1

,ClaudioCarpineto 1

,andGiovanniRomano 1

Fondazione UgoBordoni, viaB.Castiglione 59,

00142Rome,Italy

fgba,carpinet,romanog@fub.it

1 Introduction

Althoughthechoiceoftheweightingmo delmaycruciallyaectthep erformance

of any information retrieval system, there has b een little work on evaluating

the relative merits and drawbacks of dierent weighting mo dels in the CLEF

environment.ThemaingoalofourparticipationinCLEF03wastohelpllthis

gap.

We considerthreeweightingmo delswithadierenttheoreticalbackground

that have proved theireectiveness onanumb er oftasks and collections.The

three mo dels are Okapi [8], statistical language mo deling [10], and deviation

fromrandomness[3].

Westudytheretrievalp erformanceoftherankingspro ducedbyeachweight-

ingmo del,withoutandwithretrievalfeedback,onthreemonolingualtestcollec-

tions,i.e.,French,ItalianandSpanish.Thecollectionsareindexedwithstandard

techniques and theretrievalfeedback stage is p erformedusing themetho dde-

scrib edin[4].

In the following we rst describ e the three weighting mo dels, the metho d

used for retrieval feedback, and the exp erimental setting. Then we rep ort the

results andthemainconclusionsofourexp eriments.

2 The three weighting models

Fortheeaseofclarityandcomparison,thedo cumentrankingpro ducedbyeach

weightingmo delisrepresentedusingthesamegeneralexpression,namelyasthe

pro ductofado cument-basedtermweightbyaquery-based termweight:

sim(q ;d)= P

t2q ^d w

t;d w

t;q

Thisformalismallowsalsoauniformapplicationofthesubsequentretrieval

feedback stage to the rst-pass rankingpro duced by each weightingmo del,as

(2)

t;d t;q

rep ortthecompletelistofvariablesthat willb eused:

f

t

thenumb erofo ccurrences oftermt inthecollection

f

t;d

thenumb erofo ccurrences oftermt indo cumentd

f

t;q

thenumb erofo ccurrences oftermt inquery q

n

t

thenumb erofdo cumentsinwhichtermto ccurs

D thenumb erofdo cumentsinthecollection

T thenumb eroftermsinthecollection

t

theratiob etweenf

t andT

W

d

thelengthof do cumentd

W

q

thelengthof queryq

av r W

d

theaveragelengthofdo cumentsinthecollection

2.1 Okapi

To describ e Okapi, we use the expression given in[8]. This formulahas b een

usedbymostparticipantsinTRECandCLEFover thelast years.

w

t;d

=

(k

1

+1) f

t;d

k

1

(1 b)+b W

d

av gW

d

+f

t;d

w

t;q

= (k

3

+1) f

t;q

k

3 + f

t;q

l og

2

D n

t + 0:5

n

t + 0:5

2.2 Statisticallanguagemo deling (SLM)

Thestatisticallanguagemo delingapproachhasb eenpresentedinseveralpap ers

(e.g.,[5],[6]).Hereweusetheexpressiongivenin[10],withDirichletsmo othing.

w

t;d

= P

t2q ^d (l og

2 f

t;d

+

t

W

d

+

l og

2

W

d

+

l og

2

t )+W

q l og

2

W

d

+

w

t;q

= f

t;q .

2.3 Deviationfromrandomness(DFR)

Deviation from randomness has b een successfully used at the Web Track of

TREC10[1] andinCLEF 2002,fortheItalian monolingualtask [2].It is b est

describ ed in[3]

w

t;d

= (l og

2 (1 +

t ) + f

t;d l og

2 1 +

t

t )

f

t + 1

n

t (f

t;d + 1)

w

t;q

= f

t;q ,

(3)

f

t;d

= f

t l og

2 (1 +

c av r W

d

Wd )

3 Retrieval feedback

As retrieval feedbackhasb een incorp orated inmostrecent systems participat-

ing inCLEF, itis interesting to evaluate alsothep erformance ofthe dierent

weightingmo delswhentheyareenriched withretrievalfeedback.

Top erformtheexp eriments,weusedatechniquecalledinformation-theoretic

query expansion [4].Attheend of therst-pass ranking,each term inthetop

retrieved do cuments was assigned a score using the Kullback-Leibler distance

b etween thedistributionoftheterminsuch do cumentsand thedistributionof

thesameterminthewhole collection:

KLD

t;d

= f

t;d l og

2 f

t;d

f

t

andthetermswiththehighestscores wereselected forexpansion.

At this p oint,the KLD scores were used also to reweight the terms inthe

expandedquery.Astheweightsfortheunexpandedquery(i.e.,SLM,Okapi,and

DFR)andtheKLDscoreshaddierentscales,we normalizedb oththeweights

of the originalquery and the scores of the expansion terms by the maximum

corresp ondingvalue;thenthenormalizedvalues werelinearlycombined.

Thenewexpressionforcomputingthesimilarityb etweenanexpandedquery

q

exp

andado cumentdb ecomes:

sim(q

exp

;d) = P

t2q

exp

^d w

t;d (

w

t;q

Max

q w

t;q +

KLD

t;d

Max

d KLD

t;d )

4 Experimental setting

4.1 Testcollections

Theexp erimentswerep erformedusing threeCLEF 2003monolingualtest col-

lections,namelytheFrench,Spanish,andItaliancollections.Forallcollections,

thetitle+description topicstatementwasconsidered.

4.2 Do cumentandquery indexing

Weidentiedtheindividualwordso ccurringinthedo cuments,consideringonly

the admissible sections and ignoring punctuation and case. The system then

p erformedwordstemmingandwordstopping.

For word stemming, we used the French, Italian, and Spanish versions of

(4)

stop listsprovidedbySavoy[9].

Thus, we p erformeda strict single-wordindexing; furthermore,we did not

use any adho c linguistic manipulationsuch as expandingor removingcertain

wordsfromthequery textorusinglistsofprop er nouns.

4.3 Choice of exp erimentalparameters

Thenaldo cumentrankingisaectedbyanumb erof parameters.Top erform

theexp eriments,wesettheparametersusingvaluesthat haveb eenrep ortedin

theliterature.Hereisthecompletelistofparametervalues:

O k api k

1

=1.2,k

3

=1000,b=0.75

SLM =1000

D FR c=2

Retr iev al feedback 10do cs,40terms,=1, =0.5

5 Results

Foreachcollectionandforeach query,we computedsixruns:tworuns foreach

ofthethreeweightingmo desl,onewithoutandonewithretrievalfeedback(RF).

Table1, Table2, and Table 3 show theretrieval p erformance of each metho d

on the French, Italian, and Spanish collection, resp ectively. Performance was

measured using average precision (AV-PREC), precision at 5 retrieved do cu-

ments(PREC-AT-5), andprecision at 10retrieved do cuments(PREC-AT-10).

For each collectionwe show inb old the b est result withoutretrieval feedback

andtheb estresultwith retrievalfeedback.

Note that for the French and Italian collections the average precision was

greaterthantheearlyprecisions;thisisduetothefactthatforthesecollections

the mean numb er of relevant do cumentsp er query is, on average, small, and

thatthere aremanyqueries withvery fewrelevantdo cuments.

Therstmainndingofourexp erimentsisthattheb estabsoluteresultfor

each collection andfor each evaluationmeasure was alwaysobtained by DFR

withretrievalfeedback,withnotableimprovementsonseveraldatap oints.The

excellentp erformanceoftheDFRmo delisconrmedalsowhencomparingthe

weightingmo delswithoutqueryexpansion,althoughinthelattercaseDFRdid

notalwaysachieve the b est results (i.e.,for PREC-AT-5 and PREC-AT-10on

Italian,andforPREC-AT-5onSpanish).

(5)

Okapi 0.5030 0.4385 0.3654

Okapi+RF 0.5054 0.4769 0.3942

SLM 0.4753 0.4538 0.3635

SLM+RF 0.4372 0.4192 0.3462

DFR 0.5116 0.4577 0.3654

DFR+RF 0.5238 0.4885 0.3981

Table1.Retrieval p erformanceon theFrenchcollection

AV-PREC PREC-AT-5 PREC-AT-10

Okapi 0.4762 0.4588 0.3510

Okapi+RF 0.5238 0.4824 0.3902

SLM 0.5027 0.4941 0.3824

SLM+RF 0.5095 0.4824 0.3863

DFR 0.5046 0.4824 0.3725

DFR+RF 0.5364 0.5255 0.4137

Table 2.Retrieval p erformanceontheItaliancollection

AV-PREC PREC-AT-5 PREC-AT-10

Okapi 0.4606 0.5684 0.5175

Okapi+RF 0.5093 0.6105 0.5491

SLM 0.4720 0.6140 0.5157

SLM+RF 0.5112 0.5825 0.5316

DFR 0.4907 0.6035 0.5386

DFR+RF 0.5510 0.6140 0.5825

Table 3.Retrieval p erformanceontheSpanish collection

(6)

to the other. They achieved comparable results on Spanish, while Okapi was

slightly b etter than DFR on French and slightly worse on Italian. However,

when considering the rst retrieved do cuments, the p erformance of SLM was

usuallyverygo o dandsometimesevenb etter thanDFR.

TheresultsinTable1,Table2,andTable3showalsothatretrievalfeedback

improvedOkapiandDFRrunsandmostlyhurtSLMruns.Inparticular,theuse

ofretrievalfeedbackimprovedtheretrievalp erformanceof Okapiand DFRfor

all evaluationmeasures and acrossall collections,whereas it usuallydecreased

the early precision of SLM and on one o ccasion (i.e., on French) it hurt even

theaverageprecisionofSLM.Theunsatisfyingp erformanceofSLM+RFmay

b e explained by considering that the exp eriments were p erformed using long

queries.

WealsowouldliketoemphasizethattheDFRrunsshownherecorresp ondto

actuallysubmittedruns,althoughtheywerenottheb estruns.Infact,ourb est

submitted runs had language-sp ecic optimal parameters; then we submitted

for each language a run with the same exp erimentalparameters, obtained by

averagingtheb estparameters.

We alsop erformedaquery-by-query analysis.For eachquery, we computed

thedierenceb etweentheb estandtheworstretrievalresult,consideringaverage

precisionasthep erformancemeasure.Figure1,Figure2,andFigure3showthe

results forFrench,italian,andSpanish,resp ectively.

Fig.1.Performance variation onindividual queriesforFrench

Thus, the length of each bar depicts the range of p erformance variations

(7)

Fig.3.Performancevariationon individual queriesforSpanish

(8)

do es nottelluswhichmetho dp erformedb est.

Togetamorecompletepicture,wecounted,foreachcollection,thenumb erof

queriesforwhicheachmetho dachievedtheb est,median,orworstp erformance.

Theresults,showninTable4,conrmtheb etter retrievaleectiveness ofDFR

over the other two mo dels.The sup eriority of DFR over Okapi and SLM was

clear forSpanish, while DFR and Okapi obtainedmorecomparable results on

theothertwotestcollections.ForFrenchandItalian,thenumb erofb estresults

obtainedby DFR and Okapiwas similar,but,on thewhole,DFR was ranked

aheadofOkapiforamuchlargernumb erofqueries.

French Italian Spanish

SLMOkapiDFRSLMOkapiDFRSLMOkapiDFR

1st 11 20 21 10 21 20 16 16 25

2nd 11 17 24 9 16 26 10 22 25

3rd 30 15 7 32 14 5 31 19 7

Table4.Rankedp erformance

6 Conclusions

The main conclusion of our exp erimentsis that the DFR mo delwas more ef-

fective than b oth Okapi and SLM, which achieved comparable retrieval p er-

formance.In particular, DFR with query expansion obtained theb est average

absoluteresults foranyevaluationmeasureandacrossalltestcollections.

The second conclusion is that retrieval feedback always improved the p er-

formanceof Okapi and DFR,whereas itwas oftendetrimentalto theretrieval

eectiveness ofSLM, althoughthe latternding mayhave b een inuenced by

thelengthofthequeriesused intheexp eriments.

Theseresults seemto suggestthat theretrieval p erformanceofaweighting

mo del is only mo derately aected by the choice of the language, but this hy-

p othesisshouldb etakenwithcaution,b ecauseourresults wereobtainedunder

sp ecic exp erimentalconditions.

Although there are reasons to b elieve that similar results might hold also

across dierent exp erimental situations,in that we chose simple and untuned

parametervaluesandmadetypicalindexing assumptions,theissueneedsmore

investigation.Thenextstepofthisresearchistoexp erimentwithawiderrange

of factors, such as the length of queries, the values of each weightingmo del's

parameters,and the combinationof parametervalues for retrieval feedback. It

wouldalso b e usefulto exp erimentwith other languages,tosee ifthe hyp oth-

esis that the retrieval p erformanceof a weighting mo delis indep endent of the

(9)

1. G.Amati,C.Carpineto,G.Romano. FUBatTREC10webtrack:aprobabili stic

framework for topic relevance term weighting. Proceedingsof TREC-7, 182-191,

2001.

2. G.Amati,C.Carpineto,G.Romano.Italianmonolingual informationretrievalwith

PROSIT. WorkingnotesofCLEF2002,145-152,2001.

3. Gianni Amati and C.J. van Rijsb ergen. Probabilisti c mo dels of information re-

trieval based on measuring divergence from randomness. ACM Transactions on

InformationSystems,20(4):357-389,2002.

4. C.Carpineto, R.DeMori, G.Romano,and B.Bigi. Aninformation theoreticap-

proachtoautomaticqueryexpansion. ACMTransactionsonInformationSystems,

19(1):1-27,2001.

5. D.Hiemstra,W.Kraaij.Twenty-oneatTREC-7:Ad-ho candcross-languagetrack.

ProceedingsofTREC-7,227-238,1998.

6. J. Ponte, W.B. Croft. A language mo deling approach to information retrieval.

ProceedingsofSIGIR-98,275-281,1998.

7. M.F.Porter. Implementing aprobabili stic information retrieval system. Inf.Tech.

Res.Dev.,1(2):131-156,1982.

8. S.E. Rob ertson, S.Walker, M. Beaulieu. Okapi at TREC-7: Automatic ad ho c,

ltering, VLC,andinteractive track. ProceedingsofTREC-7,253-264,1998.

9. J Savoy. Rep orts on CLEF-2001 exp eriments. InWorkingnotes ofCLEF 2001,

Darmstadt,2001.

10. C.Zhai, J.Laerty. A studyofsmo othing metho dsforlanguage mo dels applied

toadho cinformation retrieval. ProceedingsofSIGIR-01,334-342,2001.