• Aucun résultat trouvé

DEU at ImageCLEFmed 2009: Evaluating a Re-ranking and Integrated Retrieval Model

N/A
N/A
Protected

Academic year: 2022

Partager "DEU at ImageCLEFmed 2009: Evaluating a Re-ranking and Integrated Retrieval Model"

Copied!
8
0
0

Texte intégral

(1)

Re-Ranking and Integrated Retrieval Model

TolgaBERBER,AdilALPKOÇAK

DokuzEylulUniversity,DepartmentofComputerEngineering

{tberber,alpkocak}@cs.deu.edu.tr

Abstract

ThispaperpresentsDEUteamparticipationinimageCLEF2009med.Maingoalofour

participation isto evaluatetwodierentapproaches: First,anew re-rankingmethod

which aims to model information system behaviour depending on several aspects of

both documents and query. Secondly, we compare a new retrieval approach which

integratestextualandvisualcontentsintoonesinglemodel. Initiallyweextracttextual

featuresofalldocumentsusingstandardvectorspacemodelandassumeasabaseline.

Then this baseline is combined with re-ranking and integrated retrieval model. Re-

rankingapproachistrainedusingImageCLEFmed 2008groundtruth data. However,

re-ranking approach did not produced satisfactory results. On the other hand, our

integratedretrievalmodelresultedtoprankamongallsubmissionsinautomaticmixed

runs.

Categories and Subject Descriptors

H.3[InformationStorage and Retrieval]: H.3.1ContentAnalysisandIndexing;H.3.3Infor-

mationSearchandRetrieval;H.3.7DigitalLibraries

General Terms

InformationRetrieval,Re-Ranking,Content-BasedImageRetrieval

Keywords

Re-Ranking, Information Retrieval, Machine Learning, Content-Based Image Retrieval, Image-

CLEFMedicalRetrievalTask.

1 Introduction

ThispaperdescribesourworkonmedicalretrievaltaskofImageCLEF2009track. Themaingoal

ofourparticipationistoevaluatetwonewapproaches;one ofthemisanewre-rankingapproach

andotheroneisanewintegratedretrievalmodelwhichintegratesbothtextualandvisualcontents

intoonesinglemodel.

Proposedre-rankingmethodconsidersseveralaspectsofbothdocumentandquery(e.g. degree

of query/document generality,document/querylength etc.) in re-ranking. System learnsto re-

rank of initially retrieved documents, using all these features. System is trained with ground

truthdataofimageCLEFmed2008. Secondapproachwepresentisanintegratedretrievalsystem

whichextendstextualdocumentvectorswithvisualterms. Themethodaimstoclosewell-known

semantic gap problem in content-basedimage retrieval by mapping low-level image features to

(2)

Consequently,imagescouldbedenedsemanticconceptsinsteadoflow-levelfeatures.

Restof thispaperisorganizedasfollows. InSection 2,ourretrievalframework isdescribed.

Section 3 gives results of both methods on imageCLEFmed 2009 dataset. Finally, Section 4

concludesthepaperandgivesalook atfuturework.

2 Retrieval Framework

Ourretrievalframeworkcontainstwonewmethodsbasedonclassicalvectorspacemodel(VSM).In

VSMadocumentisrepresentedas avectorofterms. Hence,adocumentrepository,

𝐷

,becomes

asparsematrixwhoserowsaredocumentvectors,columnsaretermvectors.

𝐷 =

𝑤 11 𝑤 12 ⋅ ⋅ ⋅ 𝑤 1𝑛

𝑤 21 𝑤 22 ⋅ ⋅ ⋅ 𝑤 2𝑛

.

.

. .

.

. .

.

. .

.

.

𝑤 𝑚1 𝑤 𝑚2 ⋅ ⋅ ⋅ 𝑤 𝑚𝑛

(1)

where

𝑤 𝑖𝑗

istheweightofterm

𝑗

indocument

𝑖

,

𝑛

istermcount,

𝑚

isdocumentcount. Literature

proposesaplentynumberof weightingschemes. Weused pivoteduniquetermweightingscheme

[4]. Ingeneral,denitionoftermweightingschemeisshownbelow.

𝑤 𝑖𝑗 = 𝐿 𝑖𝑗 𝐺 𝑖 𝑁 𝑗

(2)

where

𝑤 𝑖𝑗

istermweight,

𝐿 𝑖𝑗

islocalweightforterm

𝑖

indocument

𝑗

,

𝐺 𝑖

istheglobaltermweight

for term

𝑖

in whole document collectionand

𝑁 𝑗

is thenormalization factor fordocument

𝑗

. In

classicaltf*idf weightingscheme,

𝑁 𝑗

isalways1. Actually, thisassumption is notthebest case

forallsituations. So,inourworkweprefertousepivoteduniquetermweightingscheme.

𝑤 𝑖𝑗 = log(𝑑𝑡𝑓 ) + 1 𝑠𝑢𝑚𝑑𝑡𝑓 × log

( 𝑁 − 𝑛𝑓 𝑛𝑓

)

× 𝑈

1 + 0.0115𝑈

(3)

where

𝑑𝑡𝑓

isthenumberofthetimestermappearindocument,

𝑠𝑢𝑚𝑑𝑡𝑓

issumof

log(𝑑𝑡𝑓 ) + 1

for

document,

𝑁

isthetotalnumberof documents,

𝑛𝑓

is numberoftermscontainingthetermand

𝑈

isthenumberofuniquetermsin thedocument[1].

2.1 Re-Ranking Approach

Aftergeneratinginitialresults,re-rankingphasere-ordersinitiallygeneratedresultstoobtainhigh

precisionattopranks. Re-rankingphaseincludestraining. So,itrstrequirestoextractfeatures

frombothdocumentsandquery[2]. Table1hasalistofusedfeaturesin twogroups.

Weevaluatedsomemachinelearningalgorithmstouseintrainingphase. Table2showsresults

of10-foldcross-validationofeachmethod. BecauseC4.5Decisiontreegenerationalgorithmshown

bestperformanceamongtheothermethods,wechosentouseC4.5intrainingphase[3]. Afterall,

newranks,

𝑅 𝑛𝑒𝑤

,of alldocumentsintheinitialresultsisrecalculatedusingfollowingformula:

𝑅 𝑛𝑒𝑤 =

{ 𝛿 + 𝛼𝑅 𝑖𝑛𝑖𝑡𝑖𝑎𝑙

;ifdocumentisrelevant

𝑅 𝑖𝑛𝑖𝑡𝑖𝑎𝑙

;ifdocumentisnotrelevant

(4)

where

𝛿

and

𝛼

areshiftandbiasfactors;

𝑅 𝑖𝑛𝑖𝑡𝑖𝑎𝑙

and

𝑅 𝑛𝑒𝑤

areinitialandnewlycalculatedranks

ofthedocuments,respectively.

(3)

FeatureName Scope FeatureDenition

NumberofMatchedTerms Document

𝑡∈𝑑∩𝑞 1

NumberofBi-WordMatches Document

𝑡 𝑖 ,𝑡 𝑗 ∈𝑑∩𝑞 1, 𝑖 ∕= 𝑗, 𝑗 = 𝑖 − 1

TermCount Document

𝑡∈𝑑 1

TermCount Query

𝑡∈𝑞 1

Generality Document

𝑡∈𝑑 𝑖𝑑𝑓 (𝑡)

Generality Query

𝑡∈𝑞 𝑖𝑑𝑓(𝑡)

DepthofIndexing Document

𝑡 ∈𝑑 1, 𝑑

uniquetermsof

𝑑

RelevanceScore Document

𝑑 ⃗ ⋅ ⃗ 𝑞

InitialRank Document -

VisualFeatures

FeatureName Scope FeatureDenition

Grayscaleness Document

𝑃(𝐺)

Avg. Grayscaleness Query

1 𝑁 𝑖

∑ 𝑁 𝑖 𝑖=1 𝑃 (𝐺)

ColorAmount Document

𝑃(𝐶)

Avg. ColorAmount Query

1 𝑁 𝑖

∑ 𝑁 𝑖

𝑖=1 𝑃 (𝐶)

Table1: Featuresused to re-rank documents,where

𝑑

isterm set of document,

𝑞

is term setof

thequery,

𝑃 (𝐺)

isprobabilityofbeinggrayscaleimage,

𝑃 (𝐶)

isprobabilityofbeingcolorimage.

Method Class TP(%) FP(%) Precision Recall F ROC

C4.5

NotRel. 0,969 0,575 0,916 0,969 0,942 0,836

Relevants 0,425 0,031 0,678 0,425 0,523 0,836

Total 0,896 0,502 0,885 0,896 0,886 0,836

NeuralNetwork

NotRelevants 0,982 0,807 0,888 0,982 0,933 0,794

Relevants 0,193 0,018 0,624 0,193 0,295 0,794

Total 0,877 0,702 0,853 0,877 0,847 0,794

NaiveBayes

NotRelevants 0,960 0,810 0,885 0,960 0,921 0,674

Relevants 0,190 0,040 0,421 0,190 0,262 0,674

Total 0,857 0,708 0,823 0,857 0,833 0,674

Table2: Resultsof 10-foldCross-ValidationforMachine LearningAlgorithms.

2.2 Integrated Retrieval Model

Our second approach focuses on closing semantic gap problemof content-basedimage retrieval.

Ourapproachaimsto integratebothtextualandvisualcontentsin samespace. Systemproposes

amodicationon

𝐷

matrix(Eq. 1)byaddingvisualtermsrepresentingvisualcontents. Formally, document-termmatrixbecomesasfollows:

𝐷 =

𝑤 1,1 𝑤 1,2 ⋅ ⋅ ⋅ 𝑤 1,𝑛

𝑤 2,1 𝑤 2,2 ⋅ ⋅ ⋅ 𝑤 2,𝑛

.

.

. .

.

. .

.

. .

.

.

𝑤 𝑚,1 𝑤 𝑚,2 ⋅ ⋅ ⋅ 𝑤 𝑚,𝑛

𝑖 1,𝑛+1 𝑖 1,𝑛+2 ⋅ ⋅ ⋅ 𝑖 1,𝑛+𝑘

𝑖 1,𝑛+1 𝑖 2,𝑛+2 ⋅ ⋅ ⋅ 𝑖 2,𝑛+𝑘

.

.

.

.

.

. .

.

. .

.

.

𝑖 𝑚,𝑛+1 𝑖 𝑚,𝑛+2 ⋅ ⋅ ⋅ 𝑖 𝑚,𝑛+𝑘

(5)

where

𝑖 𝑖𝑗

isthe weightof image term

𝑗

in document

𝑖

,

𝑘

is the numberof visualterms. Visual

and textual features are normalized independently. In sum, Integrated Retrieval Model (IRM)

combines visual features to traditional text-based VSM. Initially, we chosen to use two simple

visualtermswhichaimtomodelcolorinformationofthewholeimage. InAlgorithm1extraction

(4)

of grayscalenessterm. In other words, second visual term determines what amountof image is

coloured.

Algorithm1: GrayscalenessExtractionAlgorithm

Input :ImagePixels

Output:Probabilityofbeinggrayscale

begin 1

𝑐𝑜𝑢𝑛𝑡 ← 0

2

𝑐ℎ𝑎𝑛𝑛𝑒𝑙𝑐𝑜𝑢𝑛𝑡 ←

ChannelcountofImage

3

if channelcount=1 then 4

return1.0 5

end 6

if channelcount=3 then 7

for

𝑖 = 1

toimageheight do

8

for

𝑗 = 1

toimage width do

9

if

(𝐼𝑚𝑎𝑔𝑒(𝑖, 𝑗, 0) = 𝐼𝑚𝑎𝑔𝑒(𝑖, 𝑗, 1)) ∧ (𝐼𝑚𝑎𝑔𝑒(𝑖, 𝑗, 1) = 𝐼𝑚𝑎𝑔𝑒(𝑖, 𝑗, 2))

then

10

𝑐𝑜𝑢𝑛𝑡 ← 𝑐𝑜𝑢𝑛𝑡 + 1

11

end 12

end 13

end 14

end 15

return

𝑐𝑜𝑢𝑛𝑡/𝑡𝑜𝑡𝑎𝑙𝑝𝑖𝑥𝑒𝑙𝑐𝑜𝑢𝑛𝑡

16

end 17

3 Experimentation

In this section, we describe experimentations performed in ImageCLEFmed 2009. Our experi-

mentationscanbeclassiedinto twogroups. Therstone isaboutre-rankingand secondoneis

aboutIRM.Beforegoingfurther,weperformedapreprocessinginall74902documentsincluding

combination of title and captions. First, all documentswere converted to lower-caseto achieve

uniformity. Numbersin thedocumentsarenotremoved. However,somepunctuationcharacters

likedash(-)andapostrophe(')isremoved,otherslikecomma(,),slash(/)isreplacedwithaspace.

Forexample,dash(-)characterisremovedbecauseoftheimportanceof termslikex-ray,T3-MR

etc. Wechosen wordssurrounded byspaces asindex terms. Finally wehad 33613index terms.

TheseindextermsarenormalizedasdescribedinEq. 3. Weevaluatedperformanceofthisbaseline

methodandFigure2showsresultsofthis run.

ExperimentationsofRe-RankingmethodincludestrainingphasewhereweuseimageCLEFmed

2008 data. Training data contains titles and gure captions of approximately 67000 images in

medicalarticles. Numberofindextermsis30343andsametermgenerationtechniquewithbaseline

method isused. ImageCLEFmed2008 datasetalso contains30queriesandtheir relevancedata.

These 30queriescategorized under3groups;visualquerieswhich resultswill bebetterbyusing

visualcontent,mixed querieswhich arepreparedto test performance ofmixed retrievalsystems

and textual queries which targets to textual retrieval systems. Experimentation results of our

re-ranking approach on both base-line and IRM on imgeCLEFmed 2008 data is given in Table

3and Table 4respectively. According to imageCLEFmed 2008 resultsour re-ranking approach

boostsperformanceofbothmethods. However,resultsofimageCLEFmed2009didnotshowsame

impact. Accordingto Table5,proposedre-rankingtechniquereducessystemperformance about

6 %. Reason of this loss lies in the dierence between trainingand evaluation datasets. Since

weused groundtruthdata ofimageCLEFmed 2008data, systemlearnsrelevanceinformationof

(5)

BaseLine

Visual 0.197 0.320 0.290 0.235 0.142 0.089 0.058 0.278

Mixed 0.113 0.280 0.240 0.220 0.188 0.082 0.051 0.200

Textual 0.291 0.340 0.300 0.290 0.208 0.092 0.055 0.385

Avg. 0.200 0.313 0.277 0.248 0.179 0.088 0.055 0.288

Re-Rank

Visual 0.248 0.420 0.420 0.370 0.211 0.107 0.061 0.368

Mixed 0.139 0.440 0.410 0.360 0.219 0.089 0.055 0.263

Textual 0.324 0.560 0.460 0.430 0.228 0.093 0.057 0.425

Avg. 0.237 0.473 0.430 0.387 0.219 0.096 0.058 0.352

Table3: PerformanceofOurRe-RankingApproachonBase-LineapproachusingImageCLEFmed

2008data.

Integrated retrieval model experimentation needs image features to be extracted rst. As

mentionedbefore,weusedsimplegrayscalenessfeature. Table4presentsresultsofourintegrated

model on ImageCLEFmed 2008 dataset. Whereas our model shows similar performance with

baselinemethodonvisualqueries,itshowsbetterperformanceonothertwoquerytypes. Results

of ourintegratedmodelcanbeimprovedby usingre-ranking algorithmand combination oftwo

methodsshowsthebestperformanceinallmeasures.

QueryType MAP P@5 P@10 P@20 P@100 P@500 P@1000 bpref

I-VSM

Visual 0.190 0.280 0.290 0.240 0.145 0.076 0.051 0.268

Mixed 0.130 0.360 0.320 0.290 0.200 0.088 0.054 0.216

Textual 0.312 0.380 0.320 0.365 0.240 0.096 0.059 0.423

Avg. 0.211 0.340 0.310 0.298 0.195 0.087 0.054 0.303

Re-Rank

Visual 0.259 0.400 0.390 0.365 0.221 0.117 0.068 0.397

Mixed 0.160 0.480 0.420 0.390 0.248 0.095 0.059 0.351

Textual 0.384 0.620 0.550 0.540 0.265 0.098 0.060 0.528

Avg. 0.268 0.500 0.453 0.432 0.245 0.103 0.062 0.425

Table4: Performanceof Our Mixed RetrievalSystem and Re-Ranking on OurMixed Retrieval

SystemusingImageCLEFmed 2008data.

In meaning of precision our methods outperforms baseline. In Figure 1 performance of all

evaluated methods based on recall-precisionscale in imageCLEF2008meddataset is given. Ac-

cordingtothegure,ourre-rankingmethodincreasesrecalllevelswhenitisappliedtobothofthe

approaches,base-lineandintegratedretrieval. Sinceresultsofintegratedretrievalmodelisbetter

thanbaselinemethod, Re-Rankingperformanceofmixed retrievalshowsthebest recalllevelsin

allcases.

WeparticipatedImageCLEFmed2009with5ocialandweevaluatedourre-rankingapproach

onIRMrununocially. InTable5,performanceofallourmethodsis givenonImageCLEFmed

2009 dataset. Our Integrated VSM approach shows best performance in all measures among

others. OnlyRe-RankedresultsofIRMrunshowssameperformanceatP@5 scale. Sincesimple

feature is used as avisual term, performanceof the approach will beexpected to improve with

newfeatures. Asmentionedbeforeourre-rankingapproachreducessystemperformanceaccording

to the ImageCLEFmed 2009 results. In Figure 2 Precision and recall graph of all ourmethods

is given. According to the gureour IRM outperforms all of our runs in meansof recallat all

precisionlevels.

OurIntegratedVSMapproachhasbestscoresinautomaticmixedretrievalareaoftheImage-

CLEFmed2009results. InTable6ocial resultsofautomaticmixedretrievalareaisgiven. Our

IntegreatedVSMtechniquehasthehighestMAPscoreofMixedAutomatic runs.

(6)

RunIdentier NumRel RelRet MAP P@5 P@10 P@30 P@100

deu_traditionalVSM 2362 1620 0.310 0.608 0.528 0.451 0.296

deu_traditionalVSM_rerank 2362 1615 0.286 0.592 0.508 0.457 0.294

deu_baseline 2362 1742 0.339 0.584 0.520 0.448 0.303

deu_baseline_rerank 2362 1570 0.282 0.592 0.516 0.417 0.271

deu_IRM 2362 1754 0.368 0.632 0.544 0.483 0.324

deu_IRM_rerank 2362 1629 0.307 0.632 0.528 0.448 0.272

Table5: ResultsofOurMethods onImageCLEFmed2009Dataset.

RunIdentier RelRet MAP P@5 P@10 P@100

DEUIRM 1754 0.3682 0.632 0.544 0.3244

York University_79_8_1244834388965 1724 0.3586 0.584 0.584 0.3312

York University_79_8_1244834655420 1722 0.3544 0.624 0.572 0.3244

York University_79_8_1244834554642 1763 0.3516 0.6 0.592 0.3356

York University_79_8_1244834740798 1719 0.3375 0.592 0.568 0.308

York University_79_8_1244834823312 1757 0.3272 0.592 0.568 0.308

medGIFT_77_8_1244842980151 1176 0.29 0.632 0.604 0.2924

ITI_26_8_1244811028909 1553 0.2732 0.488 0.52 0.2648

UniversityofNorthTexas_55_8_1244879759190 1659 0.2447 0.456 0.404 0.258

medGIFT_77_8_1244752959441 848 0.2097 0.704 0.592 0.2128

Table6: PerformanceofTop10OcialRuns forMixedAutomaticRetrieval.

(7)

4 Discussion and Future Work

Inthispaper,wesummarizeourparticipationto imageCLEFmedtask. Inthisstudy,wepropose

andevaluatetwonewmethods: rstmethodisare-rankingmethodwhichconsidersseveralaspects

ofbothdocumentandquerysuchasdegreeofquery/documentgenerality,document/querylength

etc. System learnsto re-rank of initially retrieved documents, using all such features. Ground

truthdataofimageCLEF2008usedto trainre-rankingsystem. Secondmethod wepresentisan

integratedretrievalsystemwhichextendstextualdocumentvectorswithvisualterms.

ResultsofImageCLEFmed2009showsthatproposedre-rankingapproachboostsperformance

ofinformationretrievalsystemifmodeleddatasetisnotsubjecttochange. Butproposedtechnique

couldmodiedto adoptitself to datasetchanges. As aresult,weobtainan adaptivere-ranking

system. Ontheotherhand,experimentsonoursecondproposalthatintegratesbothtextualand

visualcontentrankedrstamongallsubmissionsformixed automaticruns.

Inthis studyweproposed anintegratedmodelthat ultimatelyaimsto closesemanticgap in

visual information retrieval. We used a singlevisual term, however resultsare satisfactory. In

sum, itispromising that usageof visualterm givesbetter resultsthanmostof thetextual only

models.

Acknowledgements

ThisworkissupportedbyTurkishNationalScienceFoundation(TÜBTAK)underprojectnumber

107E217.

References

[1] Erica Chisholm and Tamara G. Kolda. New term weighting formulas for the vector space

method ininformationretrieval. Technicalreport,1999.

(8)

InProceedings of Learning toRank for Information RetrievalWorkshop, pages5258.ACM,

ACM,2008.

[3] J.RossQuinlan.C4.5: ProgramsforMachine Learning. MorganKaufmannPublishers,1993.

[4] Amit Singhal,ChrisBuckley,and MandarMitra. Pivoteddocumentlengthnormalization. In

SIGIR '96: Proceedings ofthe 19thannualinternational ACMSIGIR conferenceon Research

anddevelopmentin informationretrieval,pages2129,NewYork,NY,USA,1996.ACM.

Références

Documents relatifs

First, we will test traditional well-known weighting models used in text retrieval domain, such as BM25, TFIDF and Language Model (LM), for context-based image retrieval. Second,

Our goal was to develop and comparatively assess image descriptors for content based retrieval on the large medical image database.. In addition to results of official

The SMART system was used to perform retrieval using a generalized vector space model that included the original text, the automatically assigned UMLS concepts, and the UMLS

Using only a visual-based search technique, it is relatively feasible to identify the modality of a medical image, but it is very challenging to extract from an image the anatomy or

• Run 5 uses only title, caption and added paragraph(s) as text features with pseudo relevance feedback with updated stop word list.. Two runs are submitted for French queries

To generate the feature vectors at different levels of abstraction, we extract both visual concept- based feature based on a “bag of concepts” model comprising color and texture

Accordingly, in this paper we explore the annotation performance of two hierarchical classification schemes based on IRMA sub-codes and frequency of classes, and compare them to

The number of visual runs in 2008 was much lower than in previous years, and the evolution is not as fast as with textual retrieval techniques.. Five groups submitted a total of