• Aucun résultat trouvé

SNUMedinfo at CLEFeHealth2013 Task 3

N/A
N/A
Protected

Academic year: 2022

Partager "SNUMedinfo at CLEFeHealth2013 Task 3"

Copied!
3
0
0

Texte intégral

(1)

SNUMedinfo at CLEFeHealth2013 Task 3

Sungbin Choi, Jinwook Choi

Medical Informatics Laboratory, Seoul National University, Seoul, Republic of Korea [email protected], [email protected]

Abstract. This paper describes the participation of the SNUMedinfo team at the CLEFeHealth2013 task 3. We submitted 7 runs in total: 1 baseline run using query likelihood model in Indri search engine; 3 runs using passage based lan- guage model; 3 runs using passage based language model with lexical query ex- pansion. We tried to incorporate passage-based score into ranking model to re- flect the degree of query term cohesion per each document.

Keywords: Passage based language model, Query expansion, Web document, Medical information retrieval, Indri

1. Introduction

In this paper, we describe the methods in participation of the CLEFeHealth2013 Task 3 – Information retrieval to address patients’ questions. For detailed task descrip- tion, please see [1].

2. Methods

2.1 Baseline run

We submitted 1 baseline run (MEDINFO.1.3.noadd) using unigram language model with Dirichlet prior smoothing [2, 3]. Only title field is used as query. The queries are stopped at the query time using the standard 418 INQUERY stopword list, case-folded, and stemmed using Krovetz stemmer. Experimental results are described in Table 1.

Table 1. Baseline run result

Runid MAP bpref P10

MEDINFO.1.3.noadd 0.3131 0.3779 0.4800

(2)

2.2 Passage based language model

We submitted 3 runs using passage based language model [4]. We combined max- scoring passage-based relevance score with unigram language model score. Many web pages contain hierarchical category menu or tables, which does not necessarily repre- sent core topic information. We tried to incorporate passage-based score into ranking model to reflect the degree of query term cohesion. Different weighting parameter is applied on each run. In all 3 runs, only title field is used as query. Experimental results are described in Table 2.

Table 2. Passage based language model run result

Runid MAP bpref P10

MEDINFO.5.3.noadd 0.2426 0.3368 0.4040 MEDINFO.6.3.noadd 0.2343 0.3332 0.3600 MEDINFO.7.3.noadd 0.2174 0.3250 0.3480

2.3 Passage based language model with lexical query expansion

In addition to section 2.2, we applied lexical query expansion method. UMLS concepts in queries are recognized using MetaMap [5], and then original query is expanded with UMLS preferred terms. Only terms occurring in the discharge summary is chosen for expansion. For MEDINFO.2.3.noadd, only title field is used as query. For MEDINFO.3.3.noadd, title and desc field is used as query. For MEDINFO.4.3.noadd, title, desc and narr field is used as query. Experimental results are described in Table 3.

Table 3. Passage based language model with lexical query expansion run result

Runid MAP bpref P10

MEDINFO.2.3.noadd 0.2454 0.3389 0.3980 MEDINFO.3.3.noadd 0.2584 0.3434 0.4040 MEDINFO.4.3.noadd 0.2601 0.3457 0.4060

3. Conclusion

We submitted 6 runs all based on passage based retrieval model. Baseline retrieval model is shown to be quite effective. However, contrary to our intention, passage based retrieval score did more harm than good compared to our baseline. We hope to explore more effective method in the future study.

4. Acknowledgements

(3)

This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIP)(2010-0028631). The Shared Annotated Resources (ShARe) project is funded by the United States National Institutes of Health with grant number R01GM090187.

5. References

1. Suominen, H., et al., Overview of the ShARe/CLEF eHealth Evaluation Lab, in CLEF 2013, Springer Valencia, Spain.

2. Strohman, T., et al. Indri: A language model-based search engine for complex queries. in Proceedings of the International Conference on Intelligent Analysis. 2005.

3. Zhai, C. and J. Lafferty, A study of smoothing methods for language models applied to Ad Hoc information retrieval, in Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval2001, ACM: New Orleans, Louisiana, USA. p. 334-342.

4. Callan, J.P. Passage-level evidence in document retrieval. in Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval. 1994. Springer-Verlag New York, Inc.

5. Aronson, A.R. and F.-M. Lang, An overview of MetaMap: historical perspective and recent advances. Journal of the American Medical Informatics Association, 2010. 17(3): p. 229-236.

Références

Documents relatifs

Abstract. We investigate an optimal cost reachability problem for weighted timed automata, and we use a translation to SMT to solve the problem. In par- ticular, we show how to find

Conclusions: The present study proposes GROH as a novel method to estimate genomic relationship matrices and predict GEBV based on runs of homozygosity and shows that it can result

For our participation in the Hyperlinking sub-task of Search and Hyperlinking of Television Content in MediaEval 2014, we developed a new strategy of target segment determina-

Given the fact that the baseline system can be beaten on the training set, results over the test set suggest that models optimised on MRR, mGAP, and MASP for a known-item

We submitted 7 runs to Task3a (monolingual infor- mation retrieval): 1 baseline run using query likelihood model in Indri search engine; 3 runs applying UMLS based

For the ad-hoc image-based retrieval task, we submitted 1 baseline textual run using query likelihood model in Indri search engine, and 4 visual runs utilizing various

We describe Dublin City University (DCU)’s participation in the Hyperlinking sub-task of the Search and Hyperlinking of Television Content task at MediaEval 2013. Two meth- ods of

We segment all three available types of transcripts (automatic ones provided by LIMSI/Vocapia and LIUM, and manual subtitles provided by BBC) into fixed-length time units, and