Bilingual and Multilingual CLEF Tasks
FredricC.Gey 1
,HailingJiang 2
andNataliaPerelman 2
1
UCDataArchive&TechnicalAssistance,
2
SchoolofInformationManagementandSystems
UniversityofCalifornia
Berkeley,CA94720USA
Abstract. For ouractivitieswithin theCLEF 2001evaluation,
Berkeley group one participated in the bilingual,multilingual
and GIRT tasks focussingon the use ofRussian queries. Per-
formance on the Russian queries !English documents bilin-
gualtaskwasexcellent,comparabletoperformanceusingGer-
man queries. For the multilingual task we utilized English as
a pivot language between Russian and German and the En-
glish/French/German/Italian/Spanishdocumentcollections.Per-
formanceherewas merelyaverage. The GIRTtaskperformed
Russian !GermanCross-LanguageIRbycomparingweb-available
machine translation withlookup techniquesonthe GIRT the-
saurus.
1 Introduction
Successfulcross-languageinformationretrieval(CLIR)combineslinguistictech-
niques(phrasediscovery,machinetranslation,bilingualdictionarylookup)with
robustmonolingual information retrieval.For monolingualretrievalthe Berke-
ley group has used the technique of logistic regression from the beginning of
theTRECseriesofconferences.InTREC-2[1]wederiveda statisticalformula
for predicting probability of relevance based upon statistical clues contained
within documents, queries and collections as a whole. This formula was used
fordocumentretrievalinChinese[3]andSpanishinTREC-4throughTREC-6.
We utilized theidentical formula for English and German queries against the
English/French/German/Italiandocument collectionsin the CLEF 2000 eval-
uation[10]. During the past two years, the formula has proven well-suited for
Japanese and Japanese-English cross-language information retrieval as well as
English-Chinese CLIR[6], even when onlytrained onEnglish documentcollec-
tions.ParticipationintheNTCIR WorkshopsinTokoyo
(http://research.nii.ac.jp/~ntcadm/workshop/work-en.html)
led to dierent techniques for cross-language retrieval, ones which utilisedthe
powerofhumanindexingofdocumentstoimproveretrieval.Alignmentsofpar-
alleltextswereusedtocreatelarge-scalebilinguallexiconsbetweenEnglishand
JapaneseandbetweenEnglishandChinese.Suchlexiconswerewell-suitedtothe
The document ranking formula used by Berkeley inall of our CLEF retrieval
runswastheTREC-2formula[1].TheadhocretrievalresultsontheTRECtest
collectionshaveshownthattheformulaisrobustforlongqueriesandmanually
reformulatedqueries.Applyingthesameformula(trainedonEnglishTRECcol-
lections)tootherlanguageshasperformedwell,asontheTREC-4Spanishcol-
lections,theTREC-5ChinesecollectionandtheTREC-6andTREC-7European
languages(French,German,Italian)[4,5].Thusthealgorithmhasdemonstrated
its robustness independentof language as long as appropriateword boundary
detection(segmentation)canbeachieved.Thelogoddsofrelevanceofdocument
D toqueryQisgivenby
logO(R jD;Q)=log
P(R jD;Q)
P(RjD;Q)
(1)
= 3:51+ 1
p
N+1
+:0929N (2)
=37:4 N
X
i=1 qtf
i
ql+35
+0:330 N
X
i=1 log
dtf
i
dl+80
0:1937 N
X
i=1 log
ctf
i
cl
(3)
whereP(R jD;Q)istheprobabilityof relevanceofdocumentDwithrespectto
queryQ,P(R jD;Q)istheprobabilityofirrelevanceofdocumentDwithrespect
toqueryQ.Detailsaboutthederivationoftheseformulaemaybefoundinour
TRECpaper[1].Itistobeemphasizedthattraininghastakenplaceexclusively
on English documents but the matching has proven robust over seven other
languagesinmonolingualretrieval,includingJapaneseandChinesewhereword
boundariesform anadditionalstepinthediscovery process.
3 Submissions for the CLEF main tasks
CLEFhasthreemaintasks:monolingual(non-English)retrieval,bilingual(where
non-EnglishqueriesarerunagainsttheCLEFsub-collectionofEnglishlanguage
documents),and multilingual,wherequeries inany languageare runagainsta
multilingualcollectionofdocumentscomprisedoftheunionofsubcollectionsin
English,French,German,ItalianandSpanish.Wechosethisyeartoparticipate
inthe bilingualand multilingualmain tasks. Inaddition, where our focus last
yearwasonEnglishandGermansourcequeries,thisyearwewishedtoexplore
theinterestingquestionofwhetheraless-usedquerylanguage(inthiscaseRus-
sian) could achieveperformance comparable to the more mainstream Western
Europeanlanguages.
ForCLEFmaintaskswesubmitted7runs,3forthebilingual(German/Russian-
English) task and 4 for the Multilingual task. Table 1 summarizes these runs
RunName LanguageRuntype Priority
BKBIGEM1 German Manual 1
BKBIREM1 Russian Manual 2
BKBIREA1 Russian Automatic 3
FortheMultilingualtaskwesubmitted:
BKMUGAM1 German Manual 1
BKMUEAA1 English Automatic 2
BKMUEAA2 English Automatic 3
BKMUREA1 Russian Automatic 4
Table1.SummaryofsevenoÆcialCLEFruns.
3.1 Bilingual Retrieval ofthe CLEF collections
Bilingualretrievalisperformedbyrunningqueriesinanotherlanguageagainst
theEnglishcollectionofCLEF.WechosetofocusonRussianbuttodoGerman
forabaselinecomparison.TherunBKBIGEM1wasobtainedbytranslatingthe
GermanqueriestoEnglishusingtheL&HPowerTranslatorandthenmanually
adjusting the resultingqueries by searching for the untranslated terms inour
ownspecialassociationdictionarycreatedfromalibrarycatalog.Anexampleof
words notfoundcomesfrom Topic88about'mad cowdisease'. IntheGerman
version,thewordsSpongiformerandEnzephalopathiewerenottranslatedbythe
commercialsystem, butourassociation dictionaryobtainedthewords 'hepatic
encephalopathy' associated with the inquery Enzephalopathie. Further details
aboutourmethodologycanbefoundin[8].TherunBKBIREA1wasobtainedby
using thePROMPT web-based translator (http://www.translate.ru/).As with
theGermantranslation,certainwordswerenottranslated.Ourmethodologyto
dealwiththis was twofold{ rstwetransliterated theRussianqueries totheir
romanized alphabeticequivalent,andthenweadded untranslatedtermstothe
English query intheir transliterated form. For example Topic 50 on 'uprising
of Indians in Chiapas', the Russian word
qiapas
was not translated, It can, however,betransliteratedas'chiapas'.3.2 Bilingual Performance
Ourbilingualperformancecanbe foundinTable 2.Thenalline ofthetable,
labeled"CLEFPrec"iscomputedasanaverageofeachCLEFmedianprecision
amongallsubmittedruns.Theaverageisperformedoverthe47queriesforwhich
the English collection had relevant documents. While an average of medians
cannot be considered a statistic from which rigorous inference can be made,
we have foundit usefulto average themedians of allqueries as sent by CLEF
organizers. Comparing our overall precision to this average of medians yields
some fuzzy gauge of whether our performance is better, poorer, or about the
same asthemedianperformance.
Usingthismeasure wecan ndthat allourbilingualrunsperformedsigni-
Retrieved 47000 47000 47000
Relevant 856 856 856
Rel.Ret 812 737 733
Precision
at0.00 0.7797 0.6545 0.6420
at0.10 0.7390 0.6451 0.6303
at0.20 0.6912 0.5877 0.5691
at0.30 0.6306 0.5354 0.5187
at0.40 0.5944 0.4987 0.4806
at0.50 0.5529 0.4397 0.4167
at0.60 0.4693 0.3695 0.3605
at0.70 0.3952 0.3331 0.3218
at0.80 0.3494 0.2881 0.2747
at0.90 0.2869 0.2398 0.2339
at1.00 0.2375 0.1762 0.1743
Brk.Prec. 0.5088 0.4204 0.4077
CLEFPrec. 0.2423 0.2423 0.2423
Table 2.ResultsofthreeoÆcialBerkeleyCLEFbilingualruns.
3.3 Multilingual Retrieval ofthe CLEF collections
Our non-English multilingualretrievalruns were based upon ourbilingualex-
periments, extended to French/Italian/Spanish using English as a pivot lan-
guage and (again) the L&H Power Tranlator as the MT system to tranlate
queries from one language to another. Run BKMURAA1 takes the English
translatedqueriesofthebilingualrunBKBIREA1andagaintranslatesthemto
French/German/Italian/Spanish.RunBKMUGAM1takestheGerman queries
of bilingual run BKBIGEM1 as well as the translation of their Englishequiv-
alents into French/Italian/Spanish. For comparison we did direct translation
fromtheEnglishqueriesinrunsBKMUEAA1andBKMUEAA2.Thedierence
between these two runs is that BKMUEAA1 used Titleand Description elds
only.
The results show that with Russian queries we are about one third lower
in average precision than with either Englishor German queries. We are cur-
rently studying why this is so. Inaddition ouroverall performance seemsonly
slightlyabovetheCLEF-2001median performance.Thisseemedpuzzlingwhen
comparedto ourexcellentbilingualperformanceandouraboveaverageperfor-
mance at CLEF-2000. For comparison wealso inserted the average of median
precisons for last year (Row 2000 Prec. at the bottom of Table 3). As can
be seen,themedianperformanceintermsofqueryprecisionforCLEF-2001of
0.2749 isabout50percentbetterthanthemedianmultilingualperformanceof
0.1843ofCLEF-2000.Thisarguesthatsignicantprogresshasbeenmadebythe
Retrieved 50000 50000 50000 50000
Relevant 8138 8138 8138 8138
Rel.Ret 5520 5190 5223 4202
Precision
at0.00 0.8890 0.8198 0.8522 0.7698
at0.10 0.6315 0.5708 0.6058 0.4525
at0.20 0.5141 0.4703 0.5143 0.3381
at0.30 0.4441 0.3892 0.4137 0.2643
at0.40 0.3653 0.3061 0.3324 0.1796
at0.50 0.2950 0.2476 0.2697 0.1443
at0.60 0.2244 0.1736 0.2033 0.0933
at0.70 0.1502 0.1110 0.1281 0.0556
at0.80 0.0894 0.0620 0.0806 0.0319
at0.90 0.0457 0.0440 0.0315 0.0058
at1.00 0.0022 0.0029 0.0026 0.0005
Brk.Prec. 0.3101 0.2674 0.2902 0.1838
CLEFPrec. 0.2749 0.2749 0.2749 0.2749
2000Prec. 0.1843 0.1843 0.1843 0.1843
Table 3.ResultsoffouroÆcialCLEF-2001multilingualruns.
4 GIRT retrieval
Thespecial emphasisofourcurrentfunding hasfocusseduponretrievalofspe-
cialized domain documents which have been assigned individual classication
identiers by human indexers. These classication identiers can come from
thesauri.Sincemanymillionsofdollarsareexpendedondevelopingsuchclassi-
cationschemesandusingthemtoindexdocuments,itisnaturaltoattemptto
exploittheresourcesto thefullestextentpossibletoimproveretrieval.Insome
cases such thesauri are developed with identiers translated (or provided) in
multiplelanguages,andcan thusbe usedtotransfer words acrossthelanguage
barrier.
TheGIRTcollection consistsof reports and papers (grey literature)inthe
social science domain. The collection is managed and indexed by the GESIS
organization(http://www.social-science-gesis.de).GIRTisanexcellentexample
of acollectionindexed bya multilingualthesaurus,originally German-English,
recently translated into Russian. The GIRT multilingual thesaurus (German-
English),whichisbasedontheThesaurusfortheSocialSciences[2],providesthe
vocabularysourcefortheindexing termswithintheGIRT collectionof CLEF.
FurtherinformationaboutGIRTcan befoundin[7]There are76,128German
documentsinGIRTsubtaskcollection.Almostallthedocumentscontainman-
uallyassignedthesaurusterms.Onaverage,thereareabout10thesaurusterms
assigned to each document. Figure1 is anexample ofa thesaurus entry. Since
transliterationoftheCyrillicalphabetisakeypartofourretrievalstrategy,we
Inourexperiments,weindexedtheTITLEandTEXTsectionsineachdoc-
ument(nottheE-TITLEorE-TEXT).TheCLEFrulesspeciedthatindexing
anyothereldwouldneedtobedeclaredamanualrun.ForourCLEFrunsthis
yearweagainusedtheMuscatstemmer,whichissimilartothePorterstemmer
but fortheGerman language.
4.1 Querytranslation from Russian to German
Inordertoprepareforquerytranslation,werstextractedallthesinglewords
and bigrams from the Russian topic elds. Since we do not have a Russian
POS tagger,we took any two adjacent words (overlappingwordbigram)to be
considered a potential phrase. The single words and bigrams in each Russian
querywerethencompared againsttheRussian-Germanthesaurus.Ifa wordor
bigramwasfoundinthethesaurus,itsGermantranslationwasaddedtothenew
Germanquerybeingcreated.TheresultingGermanquerywasthenrunagainst
theGermancollectiontoretrieverelevantdocuments.
Forcomparison, wealsoused anonlineMT system
(Promt-Reverso:http://translation2.paralink.com/)
totranslatetheRussianqueriestoGerman.
Fuzzy Matching for the Thesaurus Therst approachto thesaurus-based
translationwas exactmatchingforthesaurus lookup.Fromthe25GIRTtopics
we obtain about 1300 Russian query terms (words and bigrams). Only 50 of
themweredirectlyfoundinthethesaurus,andthesewereallsinglewords.Two
problemscontributetothelowmatchingrate:
First, a Russian word may have several forms or variations. Usually only
thebaseform orgeneralformappearsinthethesaurus.For example,"evropa"
(EuropeinEnglish)isinthethesaurus,but "evrope"and"evropu"arenot.In
this case, a Russianmorphological analyzerwould be helpful.Since we donot
haveaRussianmorphologicalanalyzer,weusedfuzzymatchingtoaddress this
problem.
There are dierent kinds of algorithms for fuzzy matching, such as Lev-
enshtein distance, commonn-grams, longestcommon subsequence,etc [9]. We
foundthatthesimplecommonbigramalgorithmtobeveryeÆcientandeective
for matching dierent word forms. The two strings are divided into theircon-
stituentbigramsandDice'scoeÆcientisusedtocomputethesimilaritybetween
migratsiiu migratsiia wanderung
migratsii migratsiia wanderung
bezrabotitsei bezrabotitsa arbeitslosigkeit
televideniia televidenie fernsehen
kul'turu kul'tura kultur
kul'turoi kul'tura kultur
tekhnologiei tekhnologiia technologie
tekhnologii tekhnologiia technologie
AbovearesomeexamplesofRussianwordsthatdonotoccurinthethesaurus
but whose dierent forms were found inthethesaurus by fuzzymatching(the
Russiancharactersintheexamplesaretransliteratedforeasyreading).
Thesecondproblemliesinndingquerybigramswhichdonotmatchexactly
to thesaurusentries.Fuzzymatching was alsousefulforndingdierentforms
ofbigrams,evenincases wherewordorderischanged,examplesare:
OriginalRussianbigram BigramfoundinthethesaurusGermantranslation
tekhnologicheskogorazvitiia tekhnologicheskoerazvitie technologischeentwicklung
razvitieiorganizatsiia organizatsionnoerazvitie organisationsentwicklung
upravlenieorganizatsiia organizatsiiaupravleniia verwaltungsorganisation
rabochemmeste rabocheemesto arbeitsplatz
rukovodiashchikhrabotnikovrukovodiashchierabotniki fuhrungskraft
Thewaybigram'phrases'werecreatedhadtwoproblems:rst,manyofthe
bigramsweresimplynotmeaningful;second, eventhoughmostgenuinephrases
containtwo words(bigrams),approximately25percentofRussiantermsinthe
thesauruscontain3ormorewords.ARussianPOStaggerwouldbeveryhelpful
forndingmeaningfulorlongphrases.
4.2 GIRTresults and analysis
OurGIRTresultsaresummarizedinTable 4.Therunscanbedescribedasfol-
lows:BKGRGGA isa monolingualrun usingtheGerman versionof thetopics
run againsttheGermanGIRTdocumentcollection.BKGRRGA1isa Russian-
German bilingualrun using MT systemfor query translation.BKGRRGA2 is
a Russian-German bilingual run using thesaurus lookup and fuzzy matching.
BKGRRGA3isaRussian-Germanbilingualrun whichisidenticalinmethodol-
ogyto BKGRRGA2exceptthat onlytitleanddescriptionsectionsof thetopic
wereusedformatching.Aswecansee,ourRussianrunsachieveonlyabout1/3
oftheprecisionoftheGerman-Germanmonolingualrun,withasignicantedge
Retrieved 25000 25000 25000 25000
Relevant 1111 1111 1111 1111
Rel.Ret 1054 774 781 813
Precision
at0.00 0.9390 0.4408 0.4032 0.3793
at0.10 0.8225 0.3461 0.3018 0.2640
at0.20 0.7501 0.2993 0.2441 0.2381
at0.30 0.6282 0.2671 0.2257 0.1984
at0.40 0.5676 0.2381 0.1790 0.1782
at0.50 0.5166 0.1999 0.1341 0.1609
at0.60 0.4604 0.1400 0.0993 0.1323
at0.70 0.4002 0.1045 0.0712 0.1125
at0.80 0.3038 0.0693 0.0502 0.0765
at0.90 0.2097 0.0454 0.0232 0.0273
at1.00 0.0620 0.0051 0.0013 0.0000
Brk.Prec. 0.5002 0.1845 0.1448 0.1461
Table4.ResultsofoÆcialGIRTRussian-Germanruns.
retrieved more relevant documents, this did not translate into higher overall
precision.
5 Summary and Acknowlegments
The participation of Berkeley's Group One in CLEF-2001 has enabled us to
explorethediÆcultiesinextendingcross-languageinformationretrievaltoanon-
Roman alphabet language, Russian, for which limited resources are available.
Specicallywehaveexploredacomparisonofbilingualandmultilingualretrieval
where original queries were in Russian when compared against German as a
query language or English as a query language for multilingual retrieval. For
theGIRTtaskwecomparedvarious formsofRussian !German retrieval.We
havedeterminedthatthereissignicantworktobedonebeforecross-language
informationretrievalfromtheRussianlanguagewillbecomecompetitivetoother
Europeanlanguages.
Thisresearch was supportedbyDARPA(DepartmentofDefenseAdvanced
ResearchProjectsAgency)under contractN66001-97-8541; AO#F477:Search
SupportforUnfamiliarMetadataVocabularieswithintheDARPA Information
TechnologyOÆce.WethankAitaoChenforindexingthemainCLEFcollections.
References
1. WCooperAChenandFGey.Fulltextretrievalbasedonprobabilisticequations
withcoeÆcientsttedbylogisticregression. InD.K.Harman,editor,TheSecond
English. [Vol. 2:] English-German. [Edition]1999. InformationsZentrum Sozial-
wissenschaftenBonn,2000.
3. A Chen J He LXuF Gey andJ Meggs. Chinese text retrieval withoutusinga
dictionary. InA.DesaiNarasimhaluNicholasJ.BelkinandPeterWillett,editors,
Proceedingsofthe20thAnnualInternationalACMSIGIRConferenceonResearch
andDevelopmentinInformationRetrieval,Philadelphia,pages42{49,1997.
4. F.C.GeyandA.Chen. Phrasediscoveryforenglishandcross-languageretrieval
attrec-6. InD.K.HarmanandEllenVoorhees,editors,TheSixthTextREtrieval
Conference (TREC-6),NISTSpecial Publication500-240,pages637{647,August
1998.
5. F. C. Gey and H. Jiang. English-german cross-language retrieval for the girt
collection { exploiting a multilingual thesaurus. In Ellen Voorhees, editor, The
Eighth Text REtrieval Conference (TREC-8), draft notebook proceedings, pages
219{234,November1999.
6. AChen HJiangandFGey. Berkeleyatntcir-2:Chinese,japaneseandenglish ir
experiments.InN.Kando,editor,ProceedingsoftheSecondNTCIRWorkshopon
Evaluation of Chinese andJapanese Text Retrieval and Summarization, Tokoyo
Japan,pages32{39,March2001.
7. Michael KluckandFredricGey.Thedomain-specictaskofclef-specicevalua-
tionstrategiesincross-languageinformationretrieval.InCrossLanguageRetrieval
Evaluation, ProceedingsoftheCLEF2000Workshop.Springer,2001.
8. F Gey MBuckland R Larsonand A Chen. Entryvocabulary {a technology to
enhance digital search. In Proceedings of the First International Conference on
HumanLanguageTechnology,March2001.
9. Michael P.Oakes. Statisticsfor CorpusLinguistics. EdinburghUniversityPress,
1998.
10. F Gey H Jiang V Petras and A Chen. Cross-languageretrievalfor theclef col-
lections{comparingmultiplemethodsofretrieval. InCarolPeters,editor,Cross
LanguageRetrievalEvaluation,ProceedingsoftheCLEF2000Workshop.Springer,
2001.