• Aucun résultat trouvé

Robust Ad-hoc Retrieval Experiments with French and English at the University of Hildesheim

N/A
N/A
Protected

Academic year: 2022

Partager "Robust Ad-hoc Retrieval Experiments with French and English at the University of Hildesheim"

Copied!
2
0
0

Texte intégral

(1)

Robust Ad-hoc Retrieval Experiments with French and English at the University of Hildesheim

Thomas Mandl, René Hackl, Christa Womser-Hacker University of Hildesheim, Information Science

Marienburger Platz 22 D-31141 Hildesheim, Germany

mandl@uni-hildesheim.de

Abstract

This paper reports on experiments submitted for the robust task at CLEF 2006 ad intended to provide a baseline for other runs for the robust task. We applied a system previously tested for ad-hoc retrieval. Runs for mono-lingual English and French were submitted. Results on both training as well as test topics are reported. Only for French, positive results above 0.2 MAP were achieved.

Categories and Subject Descriptors

H.3 [Information Storage and Retrieval]: H.3.1 Content Analysis and Indexing; H.3.3 Information Search and Retrieval; H.3.4 Systems and Software

General Terms

Measurement, Performance, Experimentation

Keywords

Multilingual Retrieval, Robust Retrieval, Evaluation Measures

1 Introduction

We intended to provide a base line for the robust task at CLEF 2006. Our system applied to ad-hoc CLEF 2005 data (Hackl et al. 2005) is an adaptive fusion system based on the MIMOR model (Mandl & Womser-Hacker 2004). For the base line experiments, we solely optimized blind relevance feedback (BRF) parameters based on a strategy developed by Carpineto et al. (Carpineto et al. 2001). The basic retrieval engine is Lucene.

2 System Setup

Two runs for the English and two for the French monolingual data were submitted. The results for both test and training topics are shown in table 1 and 2, respectively.

Table 1. Results for Submitted Monolingual Runs Run Language Stemming BRF

(docs. - terms)

GeoAve MAP

uhienmo1 English Lucene 5-30 0.01% 7.98%

uhienmo2 English Lucene 15-30 0.01% 7.12%

uhifrmo1 French Lucene 5-30 5.76% 28.50%

uhifrmo2 French Lucene 15-30 6.25% 29.85%

(2)

Table 2. Result for Training Topics for Submitted Monolingual Runs Run Language Stemming BRF

(docs. - terms)

GeoAve MAP

uhienmo1 English Lucene 5-30 0.01% 7.16%

uhienmo2 English Lucene 15-30 0.01% 6.33%

uhifrmo1 French Lucene 5-30 8.58% 25.26%

uhifrmo2 French Lucene 15-30 9.88% 28.47%

Only the runs for French have reached a competitive level of above 0.2 MAP. The results for the geometric average for the English topics are worse, because low performance for several topics leads to a sharp drop in the performance according to this measure.

3 Future Work

For future experiments, we intend to exploit the knowledge on the impact of named entities on the retrieval process (Mandl & Womser-Hacker 2005) as well as selective relevance feedback strategies in order to improve robustness (Kwok 2005).

References

Carpineto, C.; de Mori. R.; Romano. G.; Bigi. B. (2001): An Information-Theoretic Approach to Automatic Query Expansion. In: ACM Transactions on Information Systems. 19 (1) pp. 1-27.

Hackl, René; Mandl, Thomas; Womser-Hacker, Christa (2005): Mono- and Cross-lingual Retrieval Experiments at the University of Hildesheim. In: Peters, Carol; Clough, Paul; Gonzalo, Julio; Kluck, Michael; Jones, Gareth; Magnini, Bernard (eds): Multilingual Information Access for Text. Speech and Images: Results of the Fifth CLEF Evaluation Campaign. Berlin et al.: Springer [LNCS 3491] pp. 165-169.

Kwok, K.L. (2005): An Attempt to Identify Weakest and Strongest Queries. In: ACM SIGIR 2005 Workshop: Predicting Query Difficulty - Methods and Applications. Salvador - Bahia - Brazil, August 19, 2005,

http://www.haifa.ibm.com/sigir05-qp/papers/kwok.pdf

Mandl, Thomas; Womser-Hacker, Christa (2004): A Framework for long-term Learning of Topical User Preferences in Information Retrieval. In: New Library World vol. 105 (5/6) pp. 184-195.

Mandl, Thomas; Womser-Hacker, Christa (2005): The Effect of Named Entities on Effectiveness in Cross-Language Information Retrieval Evaluation. In: Applied Computing 2005: Proc. ACM SAC Symposium on Applied Computing (SAC). Information Access and Retrieval (IAR) Track. Santa Fe, New Mexico, USA. March 13.-17. 2005. pp. 1059-1064.

Références

Documents relatifs

In our first participation in the Cross Language Evaluation Forum we focused on the development of a  retrieval  system  which  provides  a  good  baseline 

With regard to the writing system, Qubee (Latin­based alphabet) has adopted and become the official script of Afaan  Oromo since 1991.  Currently,  Afaan  Oromo 

The best performance was achieved with a strong weight on HTML title and H1 elements, a moderate weight for the other elements extracted and without blind

Hackl, René; Mandl, Thomas; Womser-Hacker, Christa (2005): Ad-hoc Mono- and Multilingual Retrieval Experiments at the University of Hildesheim.. Harman, Donna; Buckley, Chris

Several experiments with different boost values and blind relevance feedback parameters were carried out for each stemmer.. The following tables 6, 7 and 8 show the results for

• HU-292 (N´emet v´arosok ´ ujj´a´ep´ıt´ese (Rebuilding German Cities)): We saw earlier that this topic benefitted from the /nostop option (average precision up 40 points), but

This paper describes experiments based on one multilingual index carried out at the University of Hildesheim.. Several indexing strategies based on a multi-lingual index

Table 5 shows that the ‘l’ technique (stemming, i.e. the dplD score minus the dpD score) was positive on average, while the ‘s’ factor (the dpD score minus the dpD0 score,