• Aucun résultat trouvé

Use case oriented medical visual information retrieval & system evaluation

N/A
N/A
Protected

Academic year: 2022

Partager "Use case oriented medical visual information retrieval & system evaluation"

Copied!
182
0
0

Texte intégral

(1)

Thesis

Reference

Use case oriented medical visual information retrieval & system evaluation

GARCIA SECO DE HERRERA, Alba

Abstract

Large amounts of medical visual data are produced daily in hospitals, while new imaging techniques continue to emerge. In addition, many images are made available continuously via publications in the scientific literature and can also be valuable for clinical routine, research and education. Information retrieval systems are useful tools to provide access to the biomedical literature and fulfil the information needs of medical professionals. The tools developed in this thesis can potentially help clinicians make decisions about difficult diagnoses via a case-based retrieval system based on a use case associated with a specific evaluation task. This system retrieves articles from the biomedical literature when querying with a case description and attached images. This thesis proposes a multimodal approach for medical case-based retrieval with focus on the integration of visual information connected to text. Furthermore, the ImageCLEFmed evaluation campaign was organised during this thesis promoting medical retrieval system evaluation.

GARCIA SECO DE HERRERA, Alba. Use case oriented medical visual information retrieval & system evaluation. Thèse de doctorat : Univ. Genève, 2015, no. Sc. 4781

URN : urn:nbn:ch:unige-731845

DOI : 10.13097/archive-ouverte/unige:73184

Available at:

http://archive-ouverte.unige.ch/unige:73184

Disclaimer: layout of this document may differ from the published version.

1 / 1

(2)

UNIVERSIT ´E DE GEN `EVE

D´epartement d’informatique FACULT ´E DES SCIENCES

Professeur Dr.St´ephane Marchand–Maillet D´epartement de radiologie et informatique m´edicale FACULT ´E DE M ´EDECINE Professeur Dr. Henning M¨uller

Use Case Oriented Medical Visual Information Retrieval & System

Evaluation

TH` ESE

pr´esent´ee `a la Facult´e des sciences de l’Universit´e de Gen`eve pour obtenir le grade de Docteur `es sciences, mention informatique

par

Alba Garc´ıa Seco de Herrera

de

Madrid (Espagne)

Th`ese No 4781

GEN `EVE 2015

(3)

La Facult´e des sciences, sur le pr´eavis de Monsieur H. M ¨ULLER, professeur titulaire et directeur de th`ese (Facult´e de m´edecine, D´epartement de radiologie et informatique m´edicale), Monsieur S. MARCHAND-MAILLET, professeur associ´e et directeur de th`ese (D´epartement d’informatique) et Monsieur D. L. RUBIN, professeur (Department of Radi- ology and Medicine, Stanford University, Stanford, California, U.S.A.), autorise l’impression de la pr´esente th`ese, sans exprimer d’opinion sur les propositions qui y sont ´enonc´ees.

Gen`eve, le 22 Mai 2015

Th`ese - 4781 -

Le Doyen

(4)

Contents

Abstract v

R´esum´e vii

Resumen ix

Acknowledgements xi

1 Introduction 1

1.1 Motivations . . . 1

1.2 Thesis overview . . . 3

1.3 Scientific contributions of this thesis . . . 4

2 Medical Visual Information Retrieval 7 2.1 Components of a retrieval system . . . 8

2.1.1 Information sources and retrieval . . . 9

2.1.2 Information fusion . . . 11

2.1.3 Query–adaptive multi–modal fusion . . . 12

2.1.4 Modality classification . . . 13

2.2 Example systems . . . 14

2.3 Retrieval evaluation activities . . . 15

2.3.1 History . . . 15

2.3.2 Evaluation process . . . 17

2.4 Summary . . . 20

3 Use Case Description 21 3.1 What is a use case? . . . 22

3.1.1 Use case for information retrieval . . . 22

3.2 Use case description . . . 23

3.2.1 Usage narrative . . . 24

3.2.2 System features . . . 25

3.2.3 User features . . . 25

3.2.4 Session features . . . 26

3.2.5 One example of a successful flow of interaction . . . 26

3.2.6 Evaluation task: image based and case–based retrieval . . . 27

3.2.7 UML use case diagram . . . 27

3.3 Validation . . . 28

3.3.1 Use case description . . . 29 i

(5)

ii CONTENTS

3.3.2 System feature description . . . 29

3.3.3 User feature description . . . 29

3.3.4 Session feature description . . . 29

3.3.5 Evaluation task . . . 29

3.3.6 Conclusions . . . 30

3.4 Summary . . . 30

4 Evaluation Framework: ImageCLEFmed 31 4.1 ImageCLEF impact analysis (2003–2010) . . . 32

4.1.1 ImageCLEF tasks . . . 32

4.1.2 Bibliometric analysis method . . . 33

4.1.3 The dataset of the ImageCLEF publications . . . 35

4.1.4 Analysis of the ImageCLEF publications . . . 36

4.2 The ImageCLEFmed evaluation campaign before this thesis . . . 39

4.3 ImageCLEFmed during this thesis (2011–2013) . . . 42

4.3.1 Database . . . 42

4.3.2 Tasks . . . 42

4.3.3 Outcome of the evaluation activities during this thesis . . . 52

4.4 Lessons learned . . . 55

4.5 Summary . . . 56

5 Case–based Retrieval Techniques 57 5.1 Basic performance . . . 57

5.2 Data fusion . . . 59

5.2.1 Visual feature fusion . . . 60

5.2.2 Multi–modal fusion . . . 62

5.3 Query–adaptive multi–modal fusion . . . 63

5.3.1 MeSH term extraction . . . 63

5.3.2 Visual and text synonymy . . . 63

5.3.3 Query–adaptive fusion criterion . . . 65

5.4 Modality classification . . . 66

5.4.1 Multi–modal classification . . . 66

5.4.2 Training set expansion . . . 67

5.4.3 Modality filter . . . 69

5.5 Summary . . . 70

6 Experimental Results 71 6.1 Basic performance . . . 71

6.2 Query topic analysis . . . 72

6.3 Comparing fusion techniques . . . 77

6.4 Query–adaptive multi–modal fusion . . . 80

6.5 Modality classification . . . 82

6.5.1 Feature selection . . . 82

6.5.2 Semi–supervised learning and crowdsourcing . . . 82

6.5.3 Modality classification for retrieval . . . 89

6.6 Final approach . . . 91

6.7 Result discussion . . . 91

6.8 Summary . . . 93

(6)

CONTENTS iii

7 Shangri–La: a Web–based Interface for Medical Case–based Retrieval 95

7.1 System architecture . . . 95

7.2 Interface functionality . . . 96

7.2.1 Build Case page . . . 97

7.2.2 Results page . . . 98

7.2.3 My Articles page . . . 99

7.3 Summary . . . 100

8 Conclusions and outlook 101 8.1 Objectives revisited . . . 101

8.2 Contributions . . . 102

8.3 Limitations . . . 103

8.4 Perspectives . . . 103

8.4.1 Evaluation . . . 103

8.4.2 Medical case–based retrieval techniques . . . 104

8.4.3 System integration . . . 106

A Medical Use Case Validation 2012 107 A.1 Survey Questions . . . 107

A.2 Survey Responses . . . 114

B ImageCLEFmed Questionnaires 125

Nomenclature 131

Glossary 133

List of Figures 135

List of Tables 139

Bibliography 142

Index 163

(7)

iv CONTENTS

(8)

Abstract

My original contribution to knowledge is done in the following two fields: medical visual information retrieval and evaluation of retrieval systems.

Large amounts of medical visual data are produced daily in hospitals, while new imag- ing techniques continue to emerge. In addition to this, many images are made available continuously via publications in the scientific literature. Scientific publications can be very valuable for clinical routine, research and education where up–to–date medical knowledge is needed. However, it is not always easy to find the desired information in this large amount of data and in clinical routine the time to fulfil an information need is often very limited. As a consequence, there is a requirement to manage and retrieve these docu- ments/images in the most efficient and effective way. Retrieval systems are a useful tool to provide access to the biomedical literature related to information needs of medical pro- fessionals. Clinicians regularly use information retrieval systems, which benefits decision making and patient care.

To better design retrieval systems based on clinicians’ real needs, this thesis explicitly defines and validates a use case associated with a specific evaluation task. The use case deals with retrieval mechanisms able to jointly exploit textual and visual information con- nected, in the medical domain. This thesis can potentially help clinicians make decisions about difficult diagnoses by developing a medical case–based retrieval system based on the defined use case. This system retrieves articles from the biomedical literature when querying a case description and attached images.

Another main contribution of this thesis consists of a multi–modal approach for medical case–based retrieval with special focus on the integration of visual information connected to text. Different fusion strategies are analysed to evaluate if multi–modal retrieval sys- tems can achieve good performance. However, this is a challenging task and visual features do not always bring up enough information for the retrieval. Therefore, this thesis de- fines a query–adaptive multi–modal fusion criterion, which shows when visual features are suitable to be fused with text features. This criterion is based on synonym relations between text and visual information. Furthermore, an image modality classification ap- proach is implemented to integrate image modality information in the retrieval step. A semi–supervised learning technique is developed to deal with uneven classes in training data. A crowdsourcing platform is then employed to obtain a more accurate image collec- tion.

The final contribution of this thesis is an evaluation framework for medical retrieval systems. After an in–depth analysis of ImageCLEFmed benchmark in years previous to this thesis, the ImageCLEFmed evaluation campaign has been organised during this thesis oriented to the studied use–case. It includes the generation of a freely available database and ground truth following a meticulous prior preparation process. Lessons learned are also extracted from a careful evaluation and comparison of participants’ systems.

v

(9)

vi ABSTRACT

(10)

R´ esum´ e

Ma contribution originale `a la connaissance concerne deux domaines d’´etudes: la recherche d’information m´edicale visuelle et l’´evaluation des syst`emes de r´ecup´eration d’information.

Une grande quantit´e d’images est produite quotidiennement dans les hˆopitaux. Beau- coup d’entre elles sont utilis´ees par la litt´erature scientifique et sont extrˆemement pr´ecieuses pour la pratique clinique ordinaire, la recherche et l’´education. Cependant, il n’est pas ais´e pour les professionnels de la sant´e de trouver l’information d´esir´ee entre la quantit´e mas- sive de donn´ees pr´esentes et le temps limit´e disponible. Par cons´equent, il est n´ecessaire de g´erer et de r´ecup´erer des documents/images de mani`ere efficace et efficiente.

Les syst`emes de recherche d’information sont des outils utiles pour fournir un acc`es `a la litt´erature biom´edicale li´ees aux besoins des professionnels de la sant´e. Ces syst`emes peuvent fournir une aide pr´ecieuse pour la prise de d´ecision et les soins aux patients.

Afin d’am´eliorer la conception de ces syst`emes conform´ement aux besoins r´eels du personnel de sant´e, cette th`ese d´efinit explicitement et valide un cas d’utilisation associ´e `a une tˆache d’´evaluation. Ce cas d’utilisation traite des m´ecanismes de r´ecup´eration capables d’exploiter conjointement des informations m´edicales textuelles et visuelles li´ees entre elles.

En outre, cette th`ese peut potentiellement aider les cliniciens `a prendre des d´ecisions pour les diagnostics difficiles, `a travers le d´eveloppement d’un syst`eme de r´ecup´eration `a partir de cas m´edicaux pour le cas d’utilisation d´efinie.

Une autre contribution essentielle de cette th`ese est l’approche multi–modale pour la recherche d’information bas´ee sur des cas m´edicaux et qui se concentre sur l’int´egration de l’information visuelle li´ee au texte. Diff´erentes strat´egies de fusion d’information sont analys´ees pour ´evaluer si ces syst`emes peuvent obtenir de bons r´esultats. Cependant, ceci constitue une tˆache difficile specifique, car les caract´eristiques visuelles ne contiennent pas toujours suffisamment d’information pour am´eliorer la qualit´e des r´esultats. Cette th`ese d´efinit un crit`ere de fusion multi–modale adaptative `a la requˆete, et indique dans quelles circonstances les caract´eristiques visuelles sont ´eligibles pour ˆetre fusionn´ees avec le texte. Ce crit`ere est bas´e sur la synonymie entre l’information textuelle et les caract´e- ristiques visuelles. De plus, il met en place une strat´egie de classification des modalit´es d’imagerie pour ˆetre int´egr´ee `a l’´etape de r´ecup´eration. Une technique d’apprentissage semi-supervis´ee est d´evelopp´ee conjointement avec une strat´egie de “crowdsourcing” pour faire face `a l’in´egalit´e des classes dans l’ensemble d’entraˆınement et obtenir une collection d’images plus ´equilibr´ee.

La derni`ere contribution de cette th`ese est un cadre pour l’´evaluation des syst`emes de r´ecup´eration d’information m´edicale. La campagne d’´evaluation ImageCLEFmed a ´et´e organis´ee en cours de la pr´esente th`ese suite `a une analyse approfondie des standards ant´erieurs d’´evaluation. Une base de donn´ees publique avec des donn´ees de validation sont g´en´er´ees pour l’´evaluation et la comparaison des syst`emes des participants.

vii

(11)

viii RESUME

(12)

Resumen

Mi contribuci´on original al conocimiento radica en dos campos de estudio: la recu- peraci´on de informaci´on m´edica visual y laevaluaci´on de sistemas de recuperaci´on.

Una inmensa cantidad de im´agenes es producida diariamente en los hospitales derivadas del diagn´ostico a trav´es de t´ecnicas de imagen. Muchas de estas im´agenes son distribuidas a trav´es de la literatura cient´ıfica, sumamente valiosas para la pr´actica cl´ınica rutinaria, para la investigaci´on y para la educaci´on. Sin embargo, para el personal sanitario no es f´acil encontrar la informaci´on deseada entre la enorme cantidad de datos disponibles y el tiempo limitado del que dispone. Por tanto es necesario gestionar y recuperar documen- tos/im´agenes de manera efectiva y eficiente. Los sistemas de recuperaci´on de informaci´on son herramientas muy ´utiles para proporcionar acceso a la literatura biom´edica relacionada con las necesidades de los profesionales sanitarios, quienes asiduamente usan estos sistemas que benefician la toma de decisiones y la atenci´on al paciente.

Para mejorar el dise˜no de estos sistemas bas´andolos en las necesidades reales del per- sonal sanitario, esta tesis define expl´ıcitamente y valida un caso de uso asociado a una tarea de evaluaci´on espec´ıfica. Este caso de uso se ocupa de los mecanismos de recuperaci´on capaces de aprovechar conjuntamente la informaci´on m´edica textual y visual relacionadas.

As´ı mismo, esta tesis puede potencialmente ayudar al personal sanitario a tomar decisiones sobre diagn´osticos dif´ıciles, mediante el desarrollo de un sistema de recuperaci´on basado en casos, fundamentado en el caso de uso definido.

Otra contribuci´on esencial de esta tesis consiste en una estrategia multimodal para la recuperaci´on de informaci´on basada en casos m´edicos, que se centra en la integraci´on de la informaci´on visual relacionada con la textual. Diferentes estrategias de fusi´on de informaci´on son analizadas para evaluar si estos sistemas pueden obtener buenos resulta- dos. Sin embargo, esta es una tarea dif´ıcil, ya que las caracter´ısticas visuales no siempre contienen suficiente informaci´on para ayudar en la recuperaci´on de informaci´on. Esta tesis define un criterio para la fusi´on multimodal adaptable a la consulta, que muestra cu´ando las caracter´ısticas visuales son apropiadas para ser fusionadas con el texto. Este criterio se basa en la sinonimia entre la informaci´on textual y las caracter´ısticas visuales. Adi- cionalmente, se implementa una estrategia de clasificaci´on de imagen en modalidades para ser integrada en la etapa de recuperaci´on. Una t´ecnica de aprendizaje semi–supervisado junto con una estrategia de “crowdsourcing” es desarrollada para lidiar con la desigualdad de las clases en el conjunto de entrenamiento y as´ı obtener una colecci´on de im´agenes m´as precisa.

La ´ultima contribuci´on de esta tesis es un marco para la evaluaci´on de sistemas de recuperaci´on de informaci´on m´edica. La campa˜na de evaluaci´on ImageCLEFmed ha sido organizada durante esta tesis, tras un an´alisis de sus est´andares previos de evaluaci´on.

Mediante un proceso meticuloso, se ha generado una base de datos p´ublica as´ı como los datos claves para la evaluaci´on y comparaci´on de los sistemas de los participantes.

ix

(13)

x RESUMEN

(14)

Acknowledgements

“El cerebro se invent´o para salir de casa y la memoria para volver a casa.”

Jorge Wagensberg

I would like to thank all the people who contributed in some way to achieving this thesis. First and foremost, I would like to thank my advisors Professor Henning M¨uller and Professor St´ephane Marchand–Maillet for providing me with the opportunity to complete my Ph.D. thesis at the University of Geneva. I especially want to thank my advisor on the spot, Professor Henning M¨uller, who has been a tremendous mentor for me. I would like to thank him for encouraging my research and for allowing me to grow as a research scientist. Your advice on both research as well as on my career have been priceless. I would also like to thank Professor Daniel L. Rubin very much for accepting to be in the jury for my thesis defence and taking the travel from Stanford to Sierre.

The members of the medGIFT group have contributed immensely to my personal and professional time during these four years at Sierre. The group has been a source of friendships as well as good advice and collaboration. I am especially grateful to Roger Schaer and Antonio Foncubierta–Rodr´ıguez for their friendship and support, which made my thesis work possible. I want to thank Yashin Dicente Cid who has been my colleague, friend and neighbour in Amsterdam and Sierre. I want to thank present and past members of the group: Oscar Jim´enez del Toro, Ranveer Joyseeree, Manfredo Atzori, Dimitrios Markonis, Antoine Widmer, Adrien Depeursinge and the numerous summer and rotation students who have come through the group.

I spent two exciting months at the Fudan University in Shanghai, and I would like to thank Professor Yuanyuan Wang for hosting me. I would also like to thank Yu Ma and all the members of the group for their continuing hospitality.

I thank Mete for doing part of the proofreading, which was very helpful for the quality of the English in this thesis. I also want to thank St´ephane and Adrien for correcting the French in the “r´esum´e” of this thesis.

My research work was financed by University of Applied Sciences Western Switzerland.

Part of this work was supported by the European Seventh Framework Programme in the context of the PROMISE (FP7–258191) and Khresmoi (FP7–257528) projects. WIDTH (PIRSES-GA-2010-269124) project funded the exchange visit in Shanghai. I would like to express my gratitude to the institution for its support.

My time at Sierre was made enjoyable in large part due to the many friends that became a second family. In addition to the friends from the medGIFT group already mentioned, I am grateful for the time spent with my friends: Sergio, Evelyne, Tania, Andr´es, Alejandra, Visara, Stefano and little Emilia (who has grown up with this thesis).

xi

(15)

xii ACKNOWLEDGEMENTS

The “chiceria” (Claudia B., C´eline, Caroline, Sandra, Claudia P. and Lysiane) has also enriched my stay in Sierre, specially thanks to Lisa.

I particularly want to thank St´ephane for his faithful support, encouragement and patience during the final stages of this Ph.D.

No puedo olvidar en mis agradecimientos a la gente de mi querida Espa˜na. Me gustar´ıa expresar mi m´as profundo y sincero agradecimiento al Profesor Emanuele Schiavi quien fue mi fuente de motivaci´on y curiosidad en mis primeros pasos en investigaci´on. Quisiera hacer extensiva mi gratitud a mis compa˜neros del ´area de Neuroimagen de la Fundaci´on CIEN, ya que con ellos he compartido despacho e incontables horas de trabajo y buenos ratos. Especialmente gracias a Pablo y Gonzalo por su amistad.

Quiero expresar mi agradecimiento a todos aquellos amigos con los que tanto he com- partido. En especial quiero agradecer a Diego, Pablo, Mar´ıa, Ca˜nas, Luis, Eva, Di- ana, Elvira y Andr´es por recibirme con los brazos abiertos en cada una de mis visitas a Madrid. Igualmente, quiero agradecer a mis amigos de la“pandillita” porque siempre est´an ah´ı (Jose, Lurdes, Sof´ıa, Almudena B., David, Maxi, Zana, Grande, Rebeca, Almu- dena R.,Cristina, Nany y Lorena). En especial gracias a mi querida Bea por tantos buenos momentos. No puedo olvidar a mis amigos de Cabeza del Buey a pesar que los he visto muy poquito debido a esta tesis pero con los que siempre es una alegr´ıa reencontrarse.

Tambi´en es un honor haber conocido en Amsterdam a grandes amigos, en especial a Ana Mar´ıa, Josep y Joselito.

El agradecimiento m´as profundo y sentido va para mi familia. Gracias a mis primas y primos, Natalia, B´arbara, Germ´an, Fernando, Mar´ıa, Inamar y Mar´ıa (´ultima pero no menos querida), por hacerme disfrutar siempre que estoy con ellos. Gracias a mis t´ıos y t´ıas, German, Rosa, Antonio, Felisa, Manolo y Pilar por haber estado siempre conmigo. Un agradecimiento muy especial merecen mis querid´ısimos abuelos, Antonio, Patro y Angelita, por la comprensi´on, el ´animo y el amor que siempre me han dado, incluso a pesar de la distancia. Quiero recordar tambi´en a mi querido abuelo Antonio, que estoy segura de que se sentir´ıa muy orgulloso de mi.

Todo esto nunca hubiera sido posible sin el apoyo incondicional y el cari˜no que siempre obtengo de mi maravillosa famila, de mis padres, Jose Luis y Pilar, y mi hermano, Daniel.

Ellos, que me ense˜naron a amar la ciencia y que siempre han entenido mi ausencia y mis malos momentos. A pesar de la distancia siempre han estado a mi lado para saber c´omo iba y para animarme a seguir adelante. Las palabras nunca ser´an suficientes para mostrar mi amor y mi agradecimiento.

A todos los que hab´eis hecho posible esta tesis...¡Gracias!

To all of you who have made this thesis possible...Thanks!

(16)

Chapter 1

Introduction

“Everything you want is on the other side of the fear.”

Jack Canfield

This chapter gives a brief introduction to this Ph.D. thesis. The chapter begins de- scribing the motivations for this research. Next an overview of the outline of this thesis is given. Finally the achievements of this thesis in the medical visual Information Retrieval (IR) and system evaluation fields are stated.

1.1 Motivations

Clinicians generally base their decisions for diagnosis and treatment planning on a mixture of acquired textbook knowledge and experience acquired through real–life clinical cases [195]. Therefore, in the medical field, two knowledge types are generally avail- able [170]:

• Explicit knowledge– already well established and formalised domain knowledge, e.g., textbooks or clinical guidelines;

• Implicit knowledge– individual expertise, organizational practices and past cases.

When working on a new case that includes images, clinicians analyse a series of images together with contextual information, such as the patient age, gender and medical history as this data can have an impact on the visual appearance of the images. Since related problems may have similar solutions, clinicians use past situations similar to the current one to determine the diagnosis and potential treatment options, information that is also transmitted in teaching, where typical or interesting cases are discussed, and used for research [170, 249]. Thus, the goal of a clinician is often to solve a new problem by making use of previous similar situations and by reusing information and knowledge [4], also called case–based reasoning. The problem can be defined in four steps, known as the four “R’s” [93, 170]:

1. retrieve the most similar case(s) from the collection;

2. reuse them, and more precisely their solutions, to solve the problem;

3. revise the proposed solution;

1

(17)

2 CHAPTER 1. INTRODUCTION

4. retain the current case in the collection for further use.

This thesis focuses on the retrieval step because the retrieval of similar cases from a database can help clinicians to find the necessary information [195, 213]. In the retrieval step a search over the documents in the database is performed using the formulation of the information need that can include text and images or image regions. Relevant documents are ranked depending on the degree of similarity to a given query case or the similarity to the information need. The most relevant cases are then proposed on the top of the list and can be used to solve the current problem [18].

Medical IR systems are increasingly complex: they need to satisfy diverse user needs and support challenging tasks. Their development calls for proper evaluation method- ologies to ensure that they meet the expected user requirements and provide the desired effectiveness [181]. Large–scale worldwide experimental evaluations provide fundamental contributions to the advancement of state-of-the-art techniques through common evalua- tion procedures, regular and systematic evaluation cycles, comparison and benchmarking of the adopted approaches, and spreading of knowledge. In the process, vast amounts of experimental data are generated that beg for analysis tools to enable interpretation and thereby facilitate scientific and technological progress [236, 7].

Medical visual IR and its system evaluation comprise the main motivation of this thesis, taking into consideration that the medical literature currently constitutes an enormous knowledge base that includes visual as well textual information.

This thesis was carried out in the context of Participative Research labOratory for Mul- timedia and Multilingual Information Systems Evaluation (PROMISE)1 and Khresmoi2 projects. Both projects received funding support from the European Commission in the context of its European Seventh Framework Programme (FP7) and had a common interest and close cooperation on medical visual IR.

PROMISE is a Network of Excellence (NoE) funded by the FP7. PROMISE aimed at advancing the experimental evaluation of complex multimedia and multilingual informa- tion systems in order to support the decision making process of individuals, commercial entities and communities who develop, employ and improve such complex systems [63].

To move from abstract benchmarking to more user–sensitive evaluation schemes, PRO- MISE formulated a set of use cases based on scenarios of use for multimedia and multi- lingual information access. This allows leveraging previous knowledge and to avoid re–

treading previous erroneous tracks. One of the use cases is the “visual clinical decision support” which constitutes the focus of this thesis. The use case deals with visual infor- mation connected with text in the clinical domain in order to provide retrieval and access mechanisms able to jointly exploit textual and visual features.

PROMISE also facilitated management of the evaluation activities and offered access, duration, preservation, reuse, analysis, visualization and mining of the collected experi- mental data.

Khresmoi is an integrated project funded by the FP7. Khresmoi’s goal was to develop tools for multilingual multi–modal search and access system for biomedical information

1Participative Research labOratory for Multimedia and Multilingual Information Systems Evaluation (PROMISE) Network of Excellence is a FP7 –funded research network focused on researching the evaluation of multimedia and multilingual information systems (seehttp://www.promise-noe.eu/).

2Khresmoi is a European Union project funded by the FP7 focused on researching tools for multi–modal multilingual search and access systems (seehttp://khresmoi.eu/).

(18)

1.2. THESIS OVERVIEW 3

Figure 1.1: Overview of the Khresmoi project. Khresmoi combines multiple data sources and knowledge derived from various heterogeneous knowledge sources. The system allows users to access biomedical data.

and documents [10]. It addressed the challenges of searching through huge amounts of medical data, including general medical information available on the Internet, as well as radiology data in hospital archives. It allows text querying, in combination with image queries. It has three main end user groups: members of the general public, physicians and radiologists (a group of physicians for which image search is of immense importance). An overview of the Khresmoi concept is shown in Figure 1.1.

PROMISE and Khresmoi cooperated on the “visual clinical decision support” use case in order to achieve their respective objectives. They carried out joint evaluation activi- ties by exploiting the PROMISE evaluation infrastructure to experiment with Khresmoi outcomes.

1.2 Thesis overview

This thesis deals with various aspects of medical visual IR, which are studied with a focus on system evaluation.

This first chapter gives a short introduction explaining the main motivation for the research described in this thesis. The principal scientific contributions of this thesis are also briefly listed at the end of this chapter.

Chapter 2 gives an overview of the biomedical visual IR background with a focus on medical case–based retrieval. It provides references for a number of biomedical IR systems.

This chapter introduces various components and algorithms which are important through- out the multi–modal aspect of this thesis. Most importantly it includes information fusion techniques, query adaptive multi–modal fusion overview and integration of modality clas- sification into the retrieval. Retrieval evaluation activities’ history and retrieval evaluation methodology are reported in this chapter.

Chapter 3 defines and validates the “visual clinical decision support” use case, validat- ing that the use case reflects a real–life problem for the clinicians.

(19)

4 CHAPTER 1. INTRODUCTION

Chapter 4 analyses the Cross–Language Retrieval in Image Collections (ImageCLEF)3 evaluation campaign scholarly impact. The medical visual IR evaluation,ImageCLEFmed, organised in the context of this thesis is described in detail.

Chapter 5 contains a detailed description of the techniques applied to develop a med- ical case–based retrieval system. It uses the Parallel Distributed Image Search Engine (ParaDISE) system as a baseline and further components are included. This chapter focuses mainly on three features of the system: information fusion, query–adaptive multi–

modal fusion and modality classification.

Chapter 6 contains the description of the experiment carried out and the results achieved thanks to the features described in Chapter 5. The ImageCLEFmed framework is used to evaluate the performance of the system. This chapter concludes by discussing the results of the experiment carried out.

Chapter 7 presents a web–based retrieval interface called Shangri–La. This interface integrates the multi–modal retrieval approach presented in Chapter 5. Features provided by Shangri–La are described and illustrated with screenshots of the application.

Chapter 8 concludes by revisiting the objectives and summarizing the contributions made in this thesis. It points our further research directions based on the findings of this thesis.

Appendix A contains the surveys and their answers carried out for the use case vali- dation described in Chapter 3.

Appendix B presents most of the answers from the questionnaire filled by Image- CLEFmed organisers between 2011 and 2013. The analysis of this data is done in Chap- ter 4.

In addition to the main content, more sections are created to make reading the manuscript easier: table of contents; abstract of the contents of this thesis in English, French and Spanish; acknowledgement to everyone who has assisted me throughout my doctoral studies over the years; mathematical notation used in the text; glossary contain- ing abbreviations that are used in the manuscript; list of figures and tables referred in the document; bibliography referring the literature used to write this thesis and an index to help find keywords in the text.

1.3 Scientific contributions of this thesis

The main scientific contributions are in the two fields ofmedical visual IR and evalua- tion of retrieval systems. The contributions can then be classified according to these two fields.

The main contributions of this thesis in the field of retrieval system evaluation and benchmarking are the following:

• definition and validation of the “visual clinical decision support” use case [116, 115];

• ImageCLEFmed benchmarking organization [122, 180, 79]; this includes the creation of freely available databases and ground truth, the evaluation of participant systems and comparison of techniques;

3The Cross–Language Retrieval in Image Collections (ImageCLEF) is part of the Conference and Labs of the Evaluation Forum (CLEF) and aims to provide an evaluation forum for the cross–language annotation and retrieval of images (seehttp://imageclef.org/).

(20)

1.3. SCIENTIFIC CONTRIBUTIONS OF THIS THESIS 5

• detailed study of the outcomes of the ImageCLEFmed evaluation activities, espe- cially between 2011 and 2013 [194, 85, 119] as well as an assessment of the scholarly impact of ImageCLEF in previous years [236, 194];

• creation of an image database for evaluating image modality classification [76, 81].

Contributions to a medical case–based retrieval system include the following:

• a medical case–based retrieval approach implementation as well as a biomedical image modality classification approach [161, 80, 83, 164];

• an analysis of different fusion strategies to compare their performance [84, 86];

• a query–adaptive multi–modal fusion criterion implementation to decide when to use multi–modal (text and visual) or only text approaches in the retrieval step [77];

• modality classification approach implementation integrated into the medical case–

based retrieval; a semi–supervised learning technique is also proposed to exploit unlabelled data and to expand the training set [76, 81];

• a web–based retrieval interface, called Shangri–La.

Other articles written on this project include [162, 10, 11, 82, 72, 78, 73, 164, 9].

(21)

6 CHAPTER 1. INTRODUCTION

(22)

Chapter 2

Medical Visual Information Retrieval

“En la sociedad de la informaci´on ra- dica la soluci´on a la generaci´on de la inteligencia colectiva que necesitamos para seguir adelante.”

Gaspar Ari˜no Ortiz

Medicine has been represented in images since prehistoric times with early illustrations leaning toward symbolic representations. Illustrations have been developing from symbol- ism to greater realism (see Figure 2.1). Advances in medical technologies have changed the physicians vision and understanding of the human body. Different modalities of medical images, such as radiology or microscopy, show objective evidence of disease and decrease the dependence on patient’s subjective descriptions. Figure 2.2 shows some examples of findings in medical images which help physicians in their work on patient cases.

Today, images are produced in hospitals in ever–increasing numbers [5] and provide crucial information for diagnosis, treatment planning and other tasks. A recent Euro- pean report estimates that 30% of the global digital storage is occupied by medical image

(a) Rock painting, 6000 B.C. Aboriginal “X–ray style” figure. Kakadu National Park, Northern Territory, Australia.

(b) The Ebers Papyrus, 1200 B.C.

Egyptian papyrus which describes a therapy for migraine.

(c) Copperplate engraving of a woman who died near the end of term by William Hunter, 1774. National Li- brary of Medicine.

(d) Drawing of Purk- inje cells and gran- ule cells from pigeon cerebellum by Santi- ago Ram´on y Cajal, 1899. Instituto Santi- ago Ram´on y Cajal.

Figure 2.1: Examples of historical medical illustrations.

7

(23)

8 CHAPTER 2. INFORMATION RETRIEVAL

(a) Findings on colour Doppler after endovascular treatment (stenting) in a 52-year-old woman suffering from re- current transient ischemic attacks.

(b) A complete healing at the polypectomy site on an endoscopy after a 12–week course of proton pump inhibitor therapy.

(c)Hematoxylin and eosin stain on the appendix tissue reveals villous adenoma with moderate to severe dysplasia located suppurative ap- pendicitis.

Figure 2.2: Examples of medical images that help in the diagnosis and treatment planning of cases.

data [3]. Besides clinical settings, images are also made available via biomedical publi- cations. The number of biomedical articles published grew at a double–exponential pace between 1986 and 2006 according to [106]. For example, the biomedical open access liter- ature of PubMed Central (PMC)4 alone contained almost 2 million images in 2014.

Many physicians have regular information needs during clinical work, teaching prepa- ration and research activities [99, 179]. Therefore there is a need for searching through the immense collection of images in institutions and on the World Wide Web, making the data accessible for reuse. Studies showed that the time for answering a clinical information need using IR systems is around 30 minutes [101], while clinicians state to have approximately five minutes available [103]. Finding relevant information quicker is thus an important task to bring search into clinical routine [167].

Retrieval and classification of medical images have been explored to get additional information for reading and interpretation of medical cases [241] when open questions remain and thus help clinicians in their daily work.

Although text queries are commonly used, the visual information of the images can enrich the search. Images represent an important part of the content in many publications and searching for medical images has become common in retrieval applications, particu- larly for radiologists. Image retrieval has been shown to be complementary to text retrieval approaches and images can well help to represent the content of scientific articles, par- ticularly in applications using small interfaces such as mobile phones [60]. Furthermore, medical case–based retrieval taking into account several images and potentially other data of the case has also been proposed by other authors over the past 7 years [186, 249].

2.1 Components of a retrieval system

IR systems search for relevant documents and information within the contents of a specific database. In this section, the components needed to develop a rudimentary IR system that retrieves documents are first described. Figure 2.3 puts together all the basic components to outline a complete IR system. The architecture of the IR system consists

4PubMed Central (PMC) is a free full–text archive of biomedical and life sciences journal literature at the U.S. National Institute of Health’s National Library of Medicine (NIH/NLM) (seehttp://www.ncbi.

nlm.nih.gov/pmc/).

(24)

2.1. COMPONENTS OF A RETRIEVAL SYSTEM 9

Figure 2.3: Outline of the basic elements of a complete retrieval system.

of the following three components:

1. Feature extraction – the system describes the query as a set of features to handle the index;

2. Indexing – the system builds an index of the document descriptors to record and maintain the database information;

3. Similarity calculator – the system retrieves documents that are relevant to thequery from the index and displays theretrieved data to the user.

This thesis focuses on the visual information integration of the medical retrieval sys- tems. Therefore, this section presents an overview of text and visual information extrac- tion and describes several methods to improve the retrieval precision using multi–modal approaches.

2.1.1 Information sources and retrieval

Text retrieval has been successfully used in various medical fields from lung disease through cardiology, eating disorders and diabetes to hepatitis [235] and Alzheimer’s dis- ease [147].

Text in the anamnesis is often the first data available and based on the initial analysis other exams are ordered. Most biomedical search engines, also systems searching for images, have been based on text retrieval only. Sources of biomedical information can be scientific articles and also reports from the patient record [217]. The various parts of the text such as title, abstract and figure captions can then be indexed separately.

Some examples for general search tools that have also been used in the biomedical domain are the Lucene, Essie or Terrier IR tools. Lucene5 is an open source full–text search engine. The advantage of Lucene is its simplicity and high performance [166]. Essie [109]

is a phrase–based search engine with term and concept query expansion and probabilistic relevancy ranking. It was also designed to use terms from the Unified Medical Language

5The Apache Lucene is a project that develops open–source search software including indexing and search technology (seehttp://lucene.apache.org/).

(25)

10 CHAPTER 2. INFORMATION RETRIEVAL

System (UMLS). Terrier6is also an open source platform for research and experimentation in text retrieval developed at the University of Glasgow. It supports most state of the art retrieval models such as Dirichlet prior language models, Divergence from Randomness (DFR) models or Okapi BM25.

In addition to the text in the anamnesis, another initial data source for diagnosis are the images [249]. Users of biomedical sources are often interested in images for biomedical research or medical practice [193], as the images carry an important part of the information in articles. Rather than using text queries, in Content–Based Image Retrieval (CBIR) systems, visual features are extracted from the images and, based on them, images are retrieved. This allows the use of visual information to find images in a database similar to examples given or with similar regions of interest.

Visual retrieval for medical applications has also become an important research area over the past 15 years [213]. The most commonly used features for visual retrieval can be grouped into the following types [12, 107]:

• Colour – several colour image descriptors have been proposed [34] such as sim- ple colour histograms, a colour extension to the Scale Invariant Feature Transform (SIFT) [242] or the Bag of Colours (BoC) [82];

• Texture – texture features have been used to study the spatial organization of pixel values of an image like first order statistics, second order statistics, higher order statistics and multiresolution techniques such as wavelet transform [212];

• Shape – various features have been used to describe shape information, including moments, curvature or spectral features [257].

(a)In the right, regions detected by a key–region detector from the image in the left.

(b)The arrows, in the image in the left,represent the centre, scale and orientation of the key points detected in the image in the right, by the SIFT algorithm.

Figure 2.4: Information can be extracted from the visual content of the images.

Figure 2.4 shows examples of the visual information that can be extracted from the images.

The extraction of multiple visual features often enhance the retrieval performance. Multi- ple features have been explored, most frequently SIFT variants [37, 80, 52, 252, 219], Local Binary Patterns (LBP) [37, 219], edge and colour histograms [57, 37, 252, 219, 223, 39]

6Terrier is an open source search engine, readily deployable on large–scale collections of documents (see http://terrier.org/).

(26)

2.1. COMPONENTS OF A RETRIEVAL SYSTEM 11

and grey value histograms [252]. Several texture features have also been explored such as Tamura [37, 252, 219, 223], Gabor filters [252, 219, 223], Curvelets [37], a granulo- metric distribution function [39] and spatial size distribution [39]. In recent years, visual words [234] have become the main way of describing images with a variety of basic features such as SIFT [159] and also texture or colour measures.

2.1.2 Information fusion

The combination of various single search modalities (such as text and visual image features) makes it possible to use cross–modal relationships and thus improve the perfor- mance beyond the performance of single components [254]. However, the improvement of the performance of these multi–modal systems has long been considered difficult due to the richness of multimedia [95, 141] and the complexity of extracting meaningful informa- tion from visual documents in a large domain automatically [197]. Fusing the retrieval results of visual and textual resources into a final ranking is a popular approach for multi–

modal retrieval. Fusion can either be performed early or late, creating a unified data representation or fusing after each data type is analysed independently [61, 65].

Several fusion models are described in the literature to combine multi–modal sources.

Already in 1998, La Cascia et al. [148] presented a CBIR system which combined visual and textual information directly in the feature vector space representation. Textual in- formation is extracted using latent semantic indexing. In addition, visual information is captured in color and orientation histograms. More recently, Pham et al. [192] combine text and visual features by normalizing and concatenating them to generate the feature vectors. Traditionally, the most common method followed for data fusion is to search the modalities separately and fuse their results (ranked lists) with methods such as linear combination [8]. Methods to obtain suitable weights for linear combinations are reviewed by Wu [253]. Furthermore, Kludas et al. [140], Atrey et al. [12] and Depeursinge [61]

provide an overview of the different fusion methods that have been used for multimedia analysis and IR.

In terms of medical cases, images are always associated with either text or struc- tured data and this can then be used in additional to the visual content analysis for retrieval. Most often text retrieval has much better performance than visual retrieval, describing the context in which the images were taken. Poorest performance of visual techniques are achieved when applied to databases with a wide spectrum of image modal- ities, anatomies and pathologies [196]. However, there is evidence that the combination or fusion of information from textual and visual sources can improve the overall retrieval quality [157, 87, 79].

The most common approach to get the final result is the result combination of vi- sual and text retrieval. Cao et al. [38] represent the features from different modalities as a multi–dimensional matrix and incorporate these feature vectors using an extended Latent Semantic Analysis (LSA) model. Gkoufas et al. [88] increase the retrieval per- formance by applying linear methods to combine visual and textual sources of images.

Classical approaches such as the maximum combinations (combMAX), the sum combina- tions (combSUM) and the multiplication of the sum and the number of non-zero scores (combMNZ) are studied by Zhou et al. [259] showing that fusing visual and text runs outperforms single modality runs. Mour˜ao et al. [173] introduce a new fusion technique, Inverted Squared Rank (ISR), a variant of the Reciprocal rank fusion (RRF).

Furthermore, some reranking methods have also been explored [89, 105] for fusion vi-

(27)

12 CHAPTER 2. INFORMATION RETRIEVAL

sual and text information. However, strategies that reorder top–ranked documents limit the margin of improvement due to their use on a limited number of documents [187].

Mart´ınez Fern´andez et al. [165] reorder the results from the CBIR using text–based re- trieval. Viswa [243] uses visual information to rerank text–based image retrieval. The relevance of the images is linked to their initial rank position to relax the assumption that the top–ranked images in the text–based results are equally relevant.

2.1.3 Query–adaptive multi–modal fusion

Section 2.1.2 investigates techniques to fuse visual and text information to improve the precision of the retrieval. However, fusion does not always lead to better results and can even decrease the performance of the retrieval [83, 220, 174]. Therefore, to combine multi–modal retrieval two fundamental aspects should be studied: when and how multiple retrieval models can be combined to obtain better performance than individual models [157]. How to fuse multi–modal systems has been explored by studying multiple fusion techniques. These methods are particularly suitable under different settings and are studied in detail in this thesis. When to fuse multiple retrieval models, such as text or visual retrieval models, is a complicated topic. Different models used in a fusion process can provide complementary or contradictory information [12]. Hence, applying a single standard retrieval method for all possible queries is inadequate [155]. Recently,adaptive query retrieval has been an emerging trend as a solution to this problem [62]. Adaptive query techniques aim to associate individual queries with specific retrieval strategies [135].

Kennedy [135] reviews the methods proposed for adapting retrieval strategies according to the intentions of the user. Several strategies have been proposed, such as the prediction of the quality of each available tool based on statistical measures of the returned results or the adaptation strategies based on the user context. However, most of the techniques are based on query classification using Natural Language (NL) analysis of the query.

NL analysis is used in IR to translate potentially ambiguous NL queries and documents into unambiguous internal representations for retrieval [158]. Text retrieval techniques commonly use terminologies for query expansion [55, 215]. The queries can be expanded automatically with synonyms from such a terminology, for example. D´ıaz Galiano et al. [64] consider terms associated with Medical Subject Headings (MeSH) descriptors as synonyms and use these to expand queries. More recently Dram´e et al. [66] explore the use of term synonyms to expand queries. However, visual retrieval techniques cannot apply these methods directly for synonym extraction because visual information cannot be directly represented as words. Nevertheless, language modelling techniques can be extended easily to visual techniques [71].

In order to efficiently use multi–modal retrieval systems some efforts have been made to find a relation between images and text. Recently, Simpson et al. [218] review the techniques applied to deal with image content and its semantic meaning in terms of NL. A method based on global feature mapping is also presented. Kurtz et al. [145, 146] propose annotating the images with semantic terms extracted from a given ontology to build a vector of terms representing the image. Lacoste et al. [149] represent the images and the text in the same way, as vectors of concepts, building a conceptual index. However, most of the approaches use joint probabilistic models to find relationships between multi–modal features [153, 16, 68, 171, 202, 22]. Additionally, some approaches are based on image region categorization [58, 150].

(28)

2.1. COMPONENTS OF A RETRIEVAL SYSTEM 13

(a)Ultrasound. (b) Electron microscopy.

(c)Positron Emission Tomography (PET). (d) Light microscopy.

Figure 2.5: Examples of images of various modalities that can be found in the biomedical literature.

2.1.4 Modality classification

Finally, it is also possible to use image analysis and classification to extract relevant information from the images (such as modality types, anatomic regions or the recogni- tion of specific objects in the images such as arrows) to filter results lists or rerank them.

In the biomedical literature images can be of several types, some of which correspond to medical imaging modalities such as ultrasound, Magnetic Resonance Imaging (MRI), X–ray and Computer Tomography (CT) (see examples of images from various modalities in Figure 2.5). In user–studies [163], clinicians have indicated that modality is one of the most important filters that they would like to be able to limit their search by [56].

Previous studies [123, 56] have shown that imaging modality is an important piece of in- formation relating to the image for medical retrieval. Image categories can be integrated into any retrieval system to enhance or filter its results [233], benefiting both in speed and precision of the search [120] by reducing the search space to a set of relevant cate- gories [199, 83]. Furthermore, classification methods can be used to offer adaptive search methods [247, 20]. Automatic modality classification is thus an important part of the performance and usability of modern medical retrieval systems. However, image modality is typically extracted from the caption. Caption information can help if captions are well controlled like in the radiology domain but the more general biomedical literature makes it hard to find the modality information in the caption. Studies have shown that the modality can be extracted from the image itself using visual features [191, 151, 114]. Vi- sual image classification techniques have other shortcomings as some modalities can easily be mixed up when categorising automatically such as CT and MRI. In these cases text

(29)

14 CHAPTER 2. INFORMATION RETRIEVAL

information of the captions can be used as additional cues to disambiguate the two.

A big variety of visual classification techniques have been explored. Csurka et al. [57]

use a Fisher Vector representation of the images built on low level features. Kitanovski et al. [139] use a spatial pyramid in combination with dense sampling using an oppo- nentSIFT descriptor for each image patch. Support Vector Machine (SVM) withχ2 kernel is then used as a classifier. Classifiers employed range from simplek–Nearest Neighbours (k–NN) [161, 70, 80, 83, 260] or logistic regression model [39] to Genetic Programming (GP) [70] or SVM [216, 37, 252, 219, 223, 139, 220, 260, 226, 20].

An overall system which uses the predicted modality within a retrieval system consists of the following steps: the modality is extracted from the query; the usual retrieval step is performed; the predicted modalities of the document are integrated into the search.

Information about image types can be used in various ways in the retrieval. The following approaches have been explored to integrate the classification into the results [233]:

• Filtering – discarding the images of which the predicted type is different to the query. Thus, when filtering using the image type only potentially relevant results are considered;

• Reranking – reranking the initial results with the image type information. The goal is to improve the retrieval ranking by moving relevant documents towards the top of the list based on the categorization;

• Score fusion – fusing a preliminary retrieval score SR with an image classification scoreSM using a weighted sum: α·ST+(1−α)·SM, whereSRandST are normalised.

This approach allows for adjusting the parameter αto emphasise the retrieval score or the categorization results.

Sometimes, the training set contains labelled data that are rare and some classes are under–represented. This scenario is often met in medical image analysis, where accurate labelling of big datasets is difficult and expensive to obtain. Therefore training data can be augmented with additional examples to improve the classification, which has also been explored in [37, 80, 223]. Semi–supervised learning [41] uses a small number of labelled instances and a large amount of unlabelled data for training the classifier. Methods of semi–supervised learning have been applied to handwritten text recognition [36] and biological networks [255]. Related to this work, in [57] semi–supervised classification is applied to medical image classification to expand the training set. The confidence scores for the unlabelled data are given by SVM classifiers using multi–modal (visual and textual) information. Moreover, the expansion of the training set by visual retrieval is explored.

2.2 Example systems

Due to the many challenges in biomedical retrieval, research has been attracting in- creasing attention, and many approaches have been proposed [157]. This section presents a few retrieval systems that use multi–modal information for the search. A more detailed overview on platforms specialised on biomedical search can be found in Gottlieb et al. [92].

Well–known free retrieval systems such as ARRS Goldminer7 or Yottalook8 retrieve

7ARRS GoldMiner provides rapid access to published, peer–reviewed medical images (see http://

goldminer.arrs.org/).

8Yottalook is a free medical imaging search engine that provides decision support at the point of care (seehttp://www.yottalook.com/).

(30)

2.3. RETRIEVAL EVALUATION ACTIVITIES 15

images and articles from peer–reviewed biomedical journals but only based on text queries.

On the other hand, systems such as Image Retrieval in Medical Applications (IRMA)9 or img(Anaktisi)10provided only CBIR. Regarding multi–modal retrieval systems, the Center of Informatics and Information Technology group CITI presented the NovaMedSearch11 as a medical multi–modal search engine that can retrieve either similar images or related medical cases [172]. The National Library of Medicine (NLM)12 provides Open–i13 [59], a service to search and retrieve abstracts and images from the open source literature and biomedical collections.

Furthermore, as described in Section 2.1.4, to improve retrieval quality a successful classification of images into types (e.g. X–ray, ultrasound, CT, etc) can be applied to filter out irrelevant images [199]. Already many web–accessible search systems such as Goldminer or Yottalook allow users to limit the search results to a particular modality [180]

as this is a feature often requested by end users [163]. However, they extract the modality information only from the text and not from the visual features of the images.

2.3 Retrieval evaluation activities

Systematic and quantitative evaluation activities using shared tasks on shared re- sources have been instrumental in contributing to the success of IR as a research field and as an application area in the past few decades. Evaluation campaigns have enabled the reproducible and comparative evaluation of new approaches, algorithms, theories, and models, through the use of standardised resources and common evaluation methodologies within regular and systematic evaluation cycles.

2.3.1 History

In 1955, a criterion of relevance and measures for the evaluation of text IR systems was proposed for the first time by Kent et al. [136].

In the 1960s, the Cranfield tests [45] were pioneering evaluating text retrieval technol- ogy comparing the effectiveness of the different indexing techniques. Many other research groups reused the Cranfield test collection for evaluating their systems [245]. The Cran- field studies set the importance of creating test collections and using these for comparative evaluation of IR systems. After these first benchmarks, several large–scale evaluation cam- paigns have been established at the international level, with major initiatives in the field of textl IR [210].

Starting also in the 1960s and through the 1990s, the SMART IR project at Cornell University investigates the effectiveness and efficiency of automatic text retrieval meth- ods [31, 29]. This project emphasises completely automatic approaches to retrieve large

9Image Retrieval in Medical Applications (IRMA) is a project at the Aachen University of Technology (RWTH Aachen) that aims to develop and implement high–level methods for CBIR with prototypical application for medical tasks on a radiologic image archive (seehttp://ganymed.imib.rwth-aachen.de/

irma/).

10img(Anaktisi) is a web CBIR application that provides retrieval services for various image databases (seehttp://orpheus.ee.duth.gr/anaktisi/).

11NovaMedSearch is a multi–modal (text and image) medical search engine designed to find relevant medical images or cases on the Open Access Subset of PMC (seehttp://medical.novasearch.org/).

12The National Library of Medicine (NLM) maintains and makes available a vast print collection and produces electronic information resources on a wide range of topics (seehttp://nlm.nih.gov/).

13Open–i is an open access biomedical search engine (seehttp://openi.nlm.nih.gov/).

(31)

16 CHAPTER 2. INFORMATION RETRIEVAL

quantities of text. It offers a basic framework for research on the vector space and related models of IR [32].

In the 1990s, the Text REtrieval Conference (TREC)14started a consolidation to allow comparing results across the same data using the same evaluation methods [245]. TREC has provided large collections and uniform scoring procedures over the years [96]. TREC developed a research tool for evaluating retrieval methods: trec eval [33]. This tool has become the primary method used in research for retrieval evaluation to calculate the same measures using the same implementation.

Since 1999, the NII Testbeds and Community for Information access Research (NT- CIR)15 placed emphasis on IR with Japanese or other Asian languages and cross–lingual IR [131, 124, 125, 126, 127, 128, 129, 130, 207, 132]. NTCIR aims to advance in information access technologies including IR shifting from document retrieval to IR using information in the documents. NTCIR has also investigated evaluation methods for information access developing the tool NTCIREVAL [206].

Since 2000, the Conference and Labs of the Evaluation Forum (CLEF)16have organised a series of evaluation labs designed to bring different aspects of mono– and cross–language IR systems following TREC–style [26]. CLEF have support the development of an evalua- tion framework for IR systems operating in both monolingual and cross–language contexts including the creation of reusable data for benchmarking purposes.

In 2002, the INitiative for the Evaluation of XML retrieval (INEX)17organised the first workshop. The main goal of INEX has been to promote the evaluation of structural infor- mation (XML elements) to yield focused retrieval and identify relevant parts of relevant documents [75]. In 2013 and 2014, INEX run as a lab of CLEF.

Following the success of these evaluation campaigns, in 2008, the Forum for Informa- tion Retrieval Evaluation (FIRE)18 proposed a retrieval benchmark to deal with South Asian languages.

Similar evaluation exercises have also been carried out in the field ofvisual IR. In the 2000s, the Benchathlon19initiative tried to set up a common framework for the evaluation of CBIR systems. Unfortunately this initiative did not organised an evaluation campaign.

In 2001, TREC Video Retrieval Evaluation (TRECVid)20 organised a track as part of TREC. TRECVid has encouraged video IR [221] and it became an independent bench- marking initiative.

14The Text REtrieval Conference (TREC) aims to support research within the IR community by providing the infrastructure necessary for large–scale evaluation of text retrieval methodologies (see http://trec.nist.gov/).

15The NII Testbeds and Community for Information access Research (NTCIR) is an evaluation forum which aims at promoting research in information access technologies (seehttp://research.nii.ac.jp/

ntcir/index-en.html).

16The Conference and Labs of the Evaluation Forum (CLEF) is a self–organised body whose main mission is to promote research, innovation and development of information access systems with an emphasis on multi–lingual and multi–modal information (seehttp://www.clef-initiative.eu/).

17The INitiative for the Evaluation of XML retrieval (INEX) provides an IR test collection in order to measure the performance of a search engine (seehttps://inex.mmci.uni-saarland.de/).

18The Forum for Information Retrieval Evaluation (FIRE) aims to encourage research in South Asian language information access technologies (seehttp://www.isical.ac.in/~fire/).

19Benchathlon aimed to set up a favourable environment for sharing CBIR resources (see http://www.

benchathlon.net/).

20The TREC Video Retrieval Evaluation (TRECVid) evaluation meetings are an on–going series of workshops focusing on a list of different IR research areas in content–based retrieval and exploitation of digital video (seehttp://trecvid.nist.gov/).

(32)

2.3. RETRIEVAL EVALUATION ACTIVITIES 17

Similarly, MediaEval21started in 2008 as a lab of the CLEF campaign, VideoCLEF [152].

MediaEval became an independent benchmarking initiative in 2010. It has focused on the social and human aspects of multimedia access and retrieval.

ImageCLEF was offered for the first time in 2003 as one of the CLEF labs including media data such as images. This lab has aimed to compare CBIR systems and to deter- mine how associated cross–language text can be used in combination with CBIR, which is language independent, to improve retrieval performance [119].

In the biomedical field, retrieving large amounts of data is an important issue in the clinical routine. In the 1990s, OHSUMED22 provided a clinically–oriented MEDLINE subset covering all references from 270 medical journals over a five–year period (1987–

1991). The references include the title, abstract, MeSH indexing terms, author, source, and publication type. Moreover, novice physicians generated 106 queries [98]. However, OHSUMED did not provide standardised evaluation measures.

More recently, in 2011 and 2012, TREC organised the Medical Records track. This track examined the problem of retrieving relevant clinical reports from free–text fields [246].

Moreover, in 2014, TREC proposed the Clinical Decision Support track to retrieve biomed- ical articles relevant for answering generic clinical questions about medical records.

Although the medical information usually contains masses of free text [143] it also con- tains images. In 2004, the ImageCLEF lab introduced a medical task: ImageCLEFmed [50].

The tasks organised over the years by ImageCLEFmed have provided an evaluation forum and framework for evaluating the state of the art in biomedical image retrieval. This thesis focuses on the campaigns from 2011 to 2013 when the provided repositories had evolved to be close to real world in theirs size and scope [119]. Chapter 4 gives a detailed description of the evolution of ImageCLEFmed over the years.

Following the interest created by ImageCLEFmed, the Visual Concept Extraction Challenge in Radiology (Visceral)23 is organizing a retrieval benchmark to find cases with similar anomalies based on large–scale sets of 3D radiology images in 2015 [117].

These evaluation campaigns have been widely credited with contributing tremendously to the advancement of IR by providing access to infrastructure and evaluation resources that support researchers in the development of new approaches, and encouraging collab- oration and interaction between researchers from both academia and industry [119].

2.3.2 Evaluation process

A typical evaluation cycle is depicted in Figure 2.6. Each evaluation activity can have a different cycle time, e.g., the CLEF cycle operates over one year although some other evaluation campaigns operate over a longer period [48], such as NTCIR which operates over 18 months.

This section gives an overview of each step of the cycle described in the Figure 2.6.

21MediaEval is a benchmarking initiative dedicated to evaluating new algorithms for multimedia access and retrieval (seehttp://www.multimediaeval.org/).

22OHSUMED is test collection proposed for research (seehttp://ir.ohsu.edu/ohsumed/ohsumed.html).

23The Visual Concept Extraction Challenge in Radiology (Visceral) is a project supported by the Eu- ropean Commission under the Information and Communication Technologies (ICT) theme of the FP7 for research and technological development (seehttp://www.visceral.eu/).

Références

Documents relatifs

Hadoop framework is used to store images and their features in column-oriented database HBase, and utilizes MapReduce computing model to im- prove the performance of image

However, a variant of query expansion amounting to query term weighting of primary MeSH terms already present in the query could slightly improve retrieval performance. Strategies

To generate the feature vectors at different levels of abstraction, we extract both visual concept- based feature based on a “bag of concepts” model comprising color and texture

For the medical image retrieval task, we distinguished gray images from color images and used different kinds of visual features to describe them. Our submission ranked second among

Linguistic Engineering, Information Retrieval, image retrieval, cross-lingual, content-based retrieval, visual, text-based retrieval, textual, relevance feedback, GIFT, KSite,

Note that the interactive task demands a higher level of image understanding, since several of the 25 queries directly refer to the diagnosis of medical images, which is often based

Our experiments showed that it is possible to achieve good retrieval performance on a medical image collection using a CBIR approach.. We used global features, which is in contrast

Finally, CVPIC, a 4-th criterion image compression technique is introduced and it is demonstrated that compressed-domain image retrieval based on CVPIC is not only able to match