Use case oriented medical visual information retrieval & system evaluation

(1)

Thesis

Reference

Use case oriented medical visual information retrieval & system evaluation

GARCIA SECO DE HERRERA, Alba

Abstract

Large amounts of medical visual data are produced daily in hospitals, while new imaging techniques continue to emerge. In addition, many images are made available continuously via publications in the scientific literature and can also be valuable for clinical routine, research and education. Information retrieval systems are useful tools to provide access to the biomedical literature and fulfil the information needs of medical professionals. The tools developed in this thesis can potentially help clinicians make decisions about difficult diagnoses via a case-based retrieval system based on a use case associated with a specific evaluation task. This system retrieves articles from the biomedical literature when querying with a case description and attached images. This thesis proposes a multimodal approach for medical case-based retrieval with focus on the integration of visual information connected to text. Furthermore, the ImageCLEFmed evaluation campaign was organised during this thesis promoting medical retrieval system evaluation.

GARCIA SECO DE HERRERA, Alba. Use case oriented medical visual information retrieval & system evaluation. Thèse de doctorat : Univ. Genève, 2015, no. Sc. 4781

URN : urn:nbn:ch:unige-731845

DOI : 10.13097/archive-ouverte/unige:73184

Available at:

http://archive-ouverte.unige.ch/unige:73184

Disclaimer: layout of this document may differ from the published version.

1 / 1

(2)

UNIVERSIT ´E DE GEN `EVE

D´epartement d’informatique FACULT ´E DES SCIENCES

Professeur Dr.Stéphane Marchand–Maillet Département de radiologie et informatique médicale FACULT É DE M ÉDECINE Professeur Dr. Henning Müller

Use Case Oriented Medical Visual Information Retrieval & System

Evaluation

TH` ESE

présentée à la Faculté des sciences de l’Université de Genève pour obtenir le grade de Docteur ès sciences, mention informatique

par

Alba Garc´ıa Seco de Herrera

de

Madrid (Espagne)

Th`ese N^o 4781

GEN `EVE 2015

(3)

La Faculté des sciences, sur le préavis de Monsieur H. M ÜLLER, professeur titulaire et directeur de thèse (Faculté de médecine, Département de radiologie et informatique médicale), Monsieur S. MARCHAND-MAILLET, professeur associé et directeur de thèse (Département d’informatique) et Monsieur D. L. RUBIN, professeur (Department of Radi- ology and Medicine, Stanford University, Stanford, California, U.S.A.), autorise l’impression de la présente thèse, sans exprimer d’opinion sur les propositions qui y sont énoncées.

Gen`eve, le 22 Mai 2015

Th`ese - 4781 -

Le Doyen

(4)

Abstract

My original contribution to knowledge is done in the following two fields: medical visual information retrieval and evaluation of retrieval systems.

Large amounts of medical visual data are produced daily in hospitals, while new imaging techniques continue to emerge. In addition to this, many images are made available continuously via publications in the scientific literature. Scientific publications can be very valuable for clinical routine, research and education where up–to–date medical knowledge is needed. However, it is not always easy to find the desired information in this large amount of data and in clinical routine the time to fulfil an information need is often very limited. As a consequence, there is a requirement to manage and retrieve these documents/images in the most efficient and effective way. Retrieval systems are a useful tool to provide access to the biomedical literature related to information needs of medical professionals. Clinicians regularly use information retrieval systems, which benefits decision making and patient care.

To better design retrieval systems based on clinicians’ real needs, this thesis explicitly defines and validates a use case associated with a specific evaluation task. The use case deals with retrieval mechanisms able to jointly exploit textual and visual information connected, in the medical domain. This thesis can potentially help clinicians make decisions about difficult diagnoses by developing a medical case–based retrieval system based on the defined use case. This system retrieves articles from the biomedical literature when querying a case description and attached images.

Another main contribution of this thesis consists of a multi–modal approach for medical case–based retrieval with special focus on the integration of visual information connected to text. Different fusion strategies are analysed to evaluate if multi–modal retrieval systems can achieve good performance. However, this is a challenging task and visual features do not always bring up enough information for the retrieval. Therefore, this thesis defines a query–adaptive multi–modal fusion criterion, which shows when visual features are suitable to be fused with text features. This criterion is based on synonym relations between text and visual information. Furthermore, an image modality classification approach is implemented to integrate image modality information in the retrieval step. A semi–supervised learning technique is developed to deal with uneven classes in training data. A crowdsourcing platform is then employed to obtain a more accurate image collection.

The final contribution of this thesis is an evaluation framework for medical retrieval systems. After an in–depth analysis of ImageCLEFmed benchmark in years previous to this thesis, the ImageCLEFmed evaluation campaign has been organised during this thesis oriented to the studied use–case. It includes the generation of a freely available database and ground truth following a meticulous prior preparation process. Lessons learned are also extracted from a careful evaluation and comparison of participants’ systems.

v

(9)

vi ABSTRACT

(10)

R´ esum´ e

Ma contribution originale à la connaissance concerne deux domaines d’études: la recherche d’information médicale visuelle et l’évaluation des systèmes de récupération d’information.

Une grande quantité d’images est produite quotidiennement dans les hôpitaux. Beau- coup d’entre elles sont utilisées par la littérature scientifique et sont extrêmement précieuses pour la pratique clinique ordinaire, la recherche et l’éducation. Cependant, il n’est pas aisé pour les professionnels de la santé de trouver l’information désirée entre la quantité mas- sive de données présentes et le temps limité disponible. Par conséquent, il est nécessaire de gérer et de récupérer des documents/images de manière efficace et efficiente.

Les systèmes de recherche d’information sont des outils utiles pour fournir un accès à la littérature biomédicale liées aux besoins des professionnels de la santé. Ces systèmes peuvent fournir une aide précieuse pour la prise de décision et les soins aux patients.

Afin d’améliorer la conception de ces systèmes conformément aux besoins réels du personnel de santé, cette thèse définit explicitement et valide un cas d’utilisation associé à une tâche d’évaluation. Ce cas d’utilisation traite des mécanismes de récupération capables d’exploiter conjointement des informations médicales textuelles et visuelles liées entre elles.

En outre, cette thèse peut potentiellement aider les cliniciens à prendre des décisions pour les diagnostics difficiles, à travers le développement d’un système de récupération à partir de cas médicaux pour le cas d’utilisation définie.

Une autre contribution essentielle de cette thèse est l’approche multi–modale pour la recherche d’information basée sur des cas médicaux et qui se concentre sur l’intégration de l’information visuelle liée au texte. Différentes stratégies de fusion d’information sont analysées pour évaluer si ces systèmes peuvent obtenir de bons résultats. Cependant, ceci constitue une tâche difficile specifique, car les caractéristiques visuelles ne contiennent pas toujours suffisamment d’information pour améliorer la qualité des résultats. Cette thèse définit un critère de fusion multi–modale adaptative à la requête, et indique dans quelles circonstances les caractéristiques visuelles sont éligibles pour être fusionnées avec le texte. Ce critère est basé sur la synonymie entre l’information textuelle et les caracté- ristiques visuelles. De plus, il met en place une stratégie de classification des modalités d’imagerie pour être intégrée à l’étape de récupération. Une technique d’apprentissage semi-supervisée est développée conjointement avec une stratégie de “crowdsourcing” pour faire face à l’inégalité des classes dans l’ensemble d’entraˆınement et obtenir une collection d’images plus équilibrée.

La dernière contribution de cette thèse est un cadre pour l’évaluation des systèmes de récupération d’information médicale. La campagne d’évaluation ImageCLEFmed a été organisée en cours de la présente thèse suite à une analyse approfondie des standards antérieurs d’évaluation. Une base de données publique avec des données de validation sont générées pour l’évaluation et la comparaison des systèmes des participants.

vii

(11)

viii RESUME

(12)

Resumen

Mi contribución original al conocimiento radica en dos campos de estudio: la recu- peración de información médica visual y laevaluación de sistemas de recuperación.

Una inmensa cantidad de imágenes es producida diariamente en los hospitales derivadas del diagnóstico a través de técnicas de imagen. Muchas de estas imágenes son distribuidas a través de la literatura cient´ıfica, sumamente valiosas para la práctica cl´ınica rutinaria, para la investigación y para la educación. Sin embargo, para el personal sanitario no es fácil encontrar la información deseada entre la enorme cantidad de datos disponibles y el tiempo limitado del que dispone. Por tanto es necesario gestionar y recuperar documen- tos/imágenes de manera efectiva y eficiente. Los sistemas de recuperación de información son herramientas muy útiles para proporcionar acceso a la literatura biomédica relacionada con las necesidades de los profesionales sanitarios, quienes asiduamente usan estos sistemas que benefician la toma de decisiones y la atención al paciente.

Para mejorar el diseño de estos sistemas basándolos en las necesidades reales del personal sanitario, esta tesis define expl´ıcitamente y valida un caso de uso asociado a una tarea de evaluación espec´ıfica. Este caso de uso se ocupa de los mecanismos de recuperación capaces de aprovechar conjuntamente la información médica textual y visual relacionadas.

As´ı mismo, esta tesis puede potencialmente ayudar al personal sanitario a tomar decisiones sobre diagn´osticos dif´ıciles, mediante el desarrollo de un sistema de recuperaci´on basado en casos, fundamentado en el caso de uso definido.

Otra contribución esencial de esta tesis consiste en una estrategia multimodal para la recuperación de información basada en casos médicos, que se centra en la integración de la información visual relacionada con la textual. Diferentes estrategias de fusión de información son analizadas para evaluar si estos sistemas pueden obtener buenos resulta- dos. Sin embargo, esta es una tarea dif´ıcil, ya que las caracter´ısticas visuales no siempre contienen suficiente información para ayudar en la recuperación de información. Esta tesis define un criterio para la fusión multimodal adaptable a la consulta, que muestra cuándo las caracter´ısticas visuales son apropiadas para ser fusionadas con el texto. Este criterio se basa en la sinonimia entre la información textual y las caracter´ısticas visuales. Adi- cionalmente, se implementa una estrategia de clasificación de imagen en modalidades para ser integrada en la etapa de recuperación. Una técnica de aprendizaje semi–supervisado junto con una estrategia de “crowdsourcing” es desarrollada para lidiar con la desigualdad de las clases en el conjunto de entrenamiento y as´ı obtener una colección de imágenes más precisa.

La última contribución de esta tesis es un marco para la evaluación de sistemas de recuperación de información médica. La campaña de evaluación ImageCLEFmed ha sido organizada durante esta tesis, tras un análisis de sus estándares previos de evaluación.

Mediante un proceso meticuloso, se ha generado una base de datos pública as´ı como los datos claves para la evaluación y comparación de los sistemas de los participantes.

ix

(13)

x RESUMEN

(14)

Acknowledgements

“El cerebro se invent´o para salir de casa y la memoria para volver a casa.”

Jorge Wagensberg

I would like to thank all the people who contributed in some way to achieving this thesis. First and foremost, I would like to thank my advisors Professor Henning Müller and Professor Stéphane Marchand–Maillet for providing me with the opportunity to complete my Ph.D. thesis at the University of Geneva. I especially want to thank my advisor on the spot, Professor Henning Müller, who has been a tremendous mentor for me. I would like to thank him for encouraging my research and for allowing me to grow as a research scientist. Your advice on both research as well as on my career have been priceless. I would also like to thank Professor Daniel L. Rubin very much for accepting to be in the jury for my thesis defence and taking the travel from Stanford to Sierre.

The members of the medGIFT group have contributed immensely to my personal and professional time during these four years at Sierre. The group has been a source of friendships as well as good advice and collaboration. I am especially grateful to Roger Schaer and Antonio Foncubierta–Rodr´ıguez for their friendship and support, which made my thesis work possible. I want to thank Yashin Dicente Cid who has been my colleague, friend and neighbour in Amsterdam and Sierre. I want to thank present and past members of the group: Oscar Jim´enez del Toro, Ranveer Joyseeree, Manfredo Atzori, Dimitrios Markonis, Antoine Widmer, Adrien Depeursinge and the numerous summer and rotation students who have come through the group.

I spent two exciting months at the Fudan University in Shanghai, and I would like to thank Professor Yuanyuan Wang for hosting me. I would also like to thank Yu Ma and all the members of the group for their continuing hospitality.

I thank Mete for doing part of the proofreading, which was very helpful for the quality of the English in this thesis. I also want to thank Stéphane and Adrien for correcting the French in the “résumé” of this thesis.

My research work was financed by University of Applied Sciences Western Switzerland.

Part of this work was supported by the European Seventh Framework Programme in the context of the PROMISE (FP7–258191) and Khresmoi (FP7–257528) projects. WIDTH (PIRSES-GA-2010-269124) project funded the exchange visit in Shanghai. I would like to express my gratitude to the institution for its support.

My time at Sierre was made enjoyable in large part due to the many friends that became a second family. In addition to the friends from the medGIFT group already mentioned, I am grateful for the time spent with my friends: Sergio, Evelyne, Tania, Andr´es, Alejandra, Visara, Stefano and little Emilia (who has grown up with this thesis).

xi

(15)

xii ACKNOWLEDGEMENTS

The “chiceria” (Claudia B., C´eline, Caroline, Sandra, Claudia P. and Lysiane) has also enriched my stay in Sierre, specially thanks to Lisa.

I particularly want to thank St´ephane for his faithful support, encouragement and patience during the final stages of this Ph.D.

No puedo olvidar en mis agradecimientos a la gente de mi querida España. Me gustar´ıa expresar mi más profundo y sincero agradecimiento al Profesor Emanuele Schiavi quien fue mi fuente de motivación y curiosidad en mis primeros pasos en investigación. Quisiera hacer extensiva mi gratitud a mis compañeros del área de Neuroimagen de la Fundación CIEN, ya que con ellos he compartido despacho e incontables horas de trabajo y buenos ratos. Especialmente gracias a Pablo y Gonzalo por su amistad.

Quiero expresar mi agradecimiento a todos aquellos amigos con los que tanto he compartido. En especial quiero agradecer a Diego, Pablo, Mar´ıa, Cañas, Luis, Eva, Di- ana, Elvira y Andrés por recibirme con los brazos abiertos en cada una de mis visitas a Madrid. Igualmente, quiero agradecer a mis amigos de la“pandillita” porque siempre están ah´ı (Jose, Lurdes, Sof´ıa, Almudena B., David, Maxi, Zana, Grande, Rebeca, Almu- dena R.,Cristina, Nany y Lorena). En especial gracias a mi querida Bea por tantos buenos momentos. No puedo olvidar a mis amigos de Cabeza del Buey a pesar que los he visto muy poquito debido a esta tesis pero con los que siempre es una alegr´ıa reencontrarse.

Tambi´en es un honor haber conocido en Amsterdam a grandes amigos, en especial a Ana Mar´ıa, Josep y Joselito.

El agradecimiento más profundo y sentido va para mi familia. Gracias a mis primas y primos, Natalia, Bárbara, Germán, Fernando, Mar´ıa, Inamar y Mar´ıa (última pero no menos querida), por hacerme disfrutar siempre que estoy con ellos. Gracias a mis t´ıos y t´ıas, German, Rosa, Antonio, Felisa, Manolo y Pilar por haber estado siempre conmigo. Un agradecimiento muy especial merecen mis querid´ısimos abuelos, Antonio, Patro y Angelita, por la comprensión, el ánimo y el amor que siempre me han dado, incluso a pesar de la distancia. Quiero recordar también a mi querido abuelo Antonio, que estoy segura de que se sentir´ıa muy orgulloso de mi.

Todo esto nunca hubiera sido posible sin el apoyo incondicional y el cari˜no que siempre obtengo de mi maravillosa famila, de mis padres, Jose Luis y Pilar, y mi hermano, Daniel.

Ellos, que me enseñaron a amar la ciencia y que siempre han entenido mi ausencia y mis malos momentos. A pesar de la distancia siempre han estado a mi lado para saber cómo iba y para animarme a seguir adelante. Las palabras nunca serán suficientes para mostrar mi amor y mi agradecimiento.

A todos los que hab´eis hecho posible esta tesis...¡Gracias!

To all of you who have made this thesis possible...Thanks!

(16)

Chapter 1

Introduction

“Everything you want is on the other side of the fear.”

Jack Canfield

This chapter gives a brief introduction to this Ph.D. thesis. The chapter begins describing the motivations for this research. Next an overview of the outline of this thesis is given. Finally the achievements of this thesis in the medical visual Information Retrieval (IR) and system evaluation fields are stated.

1.1 Motivations

Clinicians generally base their decisions for diagnosis and treatment planning on a mixture of acquired textbook knowledge and experience acquired through real–life clinical cases [195]. Therefore, in the medical field, two knowledge types are generally available [170]:

• Explicit knowledge– already well established and formalised domain knowledge, e.g., textbooks or clinical guidelines;

• Implicit knowledge– individual expertise, organizational practices and past cases.

When working on a new case that includes images, clinicians analyse a series of images together with contextual information, such as the patient age, gender and medical history as this data can have an impact on the visual appearance of the images. Since related problems may have similar solutions, clinicians use past situations similar to the current one to determine the diagnosis and potential treatment options, information that is also transmitted in teaching, where typical or interesting cases are discussed, and used for research [170, 249]. Thus, the goal of a clinician is often to solve a new problem by making use of previous similar situations and by reusing information and knowledge [4], also called case–based reasoning. The problem can be defined in four steps, known as the four “R’s” [93, 170]:

1. retrieve the most similar case(s) from the collection;

2. reuse them, and more precisely their solutions, to solve the problem;

3. revise the proposed solution;

1

(17)

2 CHAPTER 1. INTRODUCTION

4. retain the current case in the collection for further use.

This thesis focuses on the retrieval step because the retrieval of similar cases from a database can help clinicians to find the necessary information [195, 213]. In the retrieval step a search over the documents in the database is performed using the formulation of the information need that can include text and images or image regions. Relevant documents are ranked depending on the degree of similarity to a given query case or the similarity to the information need. The most relevant cases are then proposed on the top of the list and can be used to solve the current problem [18].

Medical IR systems are increasingly complex: they need to satisfy diverse user needs and support challenging tasks. Their development calls for proper evaluation methodologies to ensure that they meet the expected user requirements and provide the desired effectiveness [181]. Large–scale worldwide experimental evaluations provide fundamental contributions to the advancement of state-of-the-art techniques through common evaluation procedures, regular and systematic evaluation cycles, comparison and benchmarking of the adopted approaches, and spreading of knowledge. In the process, vast amounts of experimental data are generated that beg for analysis tools to enable interpretation and thereby facilitate scientific and technological progress [236, 7].

Medical visual IR and its system evaluation comprise the main motivation of this thesis, taking into consideration that the medical literature currently constitutes an enormous knowledge base that includes visual as well textual information.

This thesis was carried out in the context of Participative Research labOratory for Mul- timedia and Multilingual Information Systems Evaluation (PROMISE)¹ and Khresmoi² projects. Both projects received funding support from the European Commission in the context of its European Seventh Framework Programme (FP7) and had a common interest and close cooperation on medical visual IR.

PROMISE is a Network of Excellence (NoE) funded by the FP7. PROMISE aimed at advancing the experimental evaluation of complex multimedia and multilingual information systems in order to support the decision making process of individuals, commercial entities and communities who develop, employ and improve such complex systems [63].

To move from abstract benchmarking to more user–sensitive evaluation schemes, PRO- MISE formulated a set of use cases based on scenarios of use for multimedia and multilingual information access. This allows leveraging previous knowledge and to avoid re–

treading previous erroneous tracks. One of the use cases is the “visual clinical decision support” which constitutes the focus of this thesis. The use case deals with visual information connected with text in the clinical domain in order to provide retrieval and access mechanisms able to jointly exploit textual and visual features.

PROMISE also facilitated management of the evaluation activities and offered access, duration, preservation, reuse, analysis, visualization and mining of the collected experimental data.

Khresmoi is an integrated project funded by the FP7. Khresmoi’s goal was to develop tools for multilingual multi–modal search and access system for biomedical information

1Participative Research labOratory for Multimedia and Multilingual Information Systems Evaluation (PROMISE) Network of Excellence is a FP7 –funded research network focused on researching the evaluation of multimedia and multilingual information systems (seehttp://www.promise-noe.eu/).

2Khresmoi is a European Union project funded by the FP7 focused on researching tools for multi–modal multilingual search and access systems (seehttp://khresmoi.eu/).

(18)

1.2. THESIS OVERVIEW 3

Figure 1.1: Overview of the Khresmoi project. Khresmoi combines multiple data sources and knowledge derived from various heterogeneous knowledge sources. The system allows users to access biomedical data.

and documents [10]. It addressed the challenges of searching through huge amounts of medical data, including general medical information available on the Internet, as well as radiology data in hospital archives. It allows text querying, in combination with image queries. It has three main end user groups: members of the general public, physicians and radiologists (a group of physicians for which image search is of immense importance). An overview of the Khresmoi concept is shown in Figure 1.1.

PROMISE and Khresmoi cooperated on the “visual clinical decision support” use case in order to achieve their respective objectives. They carried out joint evaluation activities by exploiting the PROMISE evaluation infrastructure to experiment with Khresmoi outcomes.

1.2 Thesis overview

This thesis deals with various aspects of medical visual IR, which are studied with a focus on system evaluation.

This first chapter gives a short introduction explaining the main motivation for the research described in this thesis. The principal scientific contributions of this thesis are also briefly listed at the end of this chapter.

Chapter 2 gives an overview of the biomedical visual IR background with a focus on medical case–based retrieval. It provides references for a number of biomedical IR systems.

This chapter introduces various components and algorithms which are important throughout the multi–modal aspect of this thesis. Most importantly it includes information fusion techniques, query adaptive multi–modal fusion overview and integration of modality classification into the retrieval. Retrieval evaluation activities’ history and retrieval evaluation methodology are reported in this chapter.

Chapter 3 defines and validates the “visual clinical decision support” use case, validat- ing that the use case reflects a real–life problem for the clinicians.

(19)

Chapter 4 analyses the Cross–Language Retrieval in Image Collections (ImageCLEF)³ evaluation campaign scholarly impact. The medical visual IR evaluation,ImageCLEFmed, organised in the context of this thesis is described in detail.

Chapter 5 contains a detailed description of the techniques applied to develop a medical case–based retrieval system. It uses the Parallel Distributed Image Search Engine (ParaDISE) system as a baseline and further components are included. This chapter focuses mainly on three features of the system: information fusion, query–adaptive multi–

modal fusion and modality classification.

Chapter 6 contains the description of the experiment carried out and the results achieved thanks to the features described in Chapter 5. The ImageCLEFmed framework is used to evaluate the performance of the system. This chapter concludes by discussing the results of the experiment carried out.

Chapter 7 presents a web–based retrieval interface called Shangri–La. This interface integrates the multi–modal retrieval approach presented in Chapter 5. Features provided by Shangri–La are described and illustrated with screenshots of the application.

Chapter 8 concludes by revisiting the objectives and summarizing the contributions made in this thesis. It points our further research directions based on the findings of this thesis.

Appendix A contains the surveys and their answers carried out for the use case validation described in Chapter 3.

Appendix B presents most of the answers from the questionnaire filled by Image- CLEFmed organisers between 2011 and 2013. The analysis of this data is done in Chap- ter 4.

In addition to the main content, more sections are created to make reading the manuscript easier: table of contents; abstract of the contents of this thesis in English, French and Spanish; acknowledgement to everyone who has assisted me throughout my doctoral studies over the years; mathematical notation used in the text; glossary contain- ing abbreviations that are used in the manuscript; list of figures and tables referred in the document; bibliography referring the literature used to write this thesis and an index to help find keywords in the text.

1.3 Scientific contributions of this thesis

The main scientific contributions are in the two fields ofmedical visual IR and evaluation of retrieval systems. The contributions can then be classified according to these two fields.

The main contributions of this thesis in the field of retrieval system evaluation and benchmarking are the following:

• definition and validation of the “visual clinical decision support” use case [116, 115];

• ImageCLEFmed benchmarking organization [122, 180, 79]; this includes the creation of freely available databases and ground truth, the evaluation of participant systems and comparison of techniques;

3The Cross–Language Retrieval in Image Collections (ImageCLEF) is part of the Conference and Labs of the Evaluation Forum (CLEF) and aims to provide an evaluation forum for the cross–language annotation and retrieval of images (seehttp://imageclef.org/).

(20)

1.3. SCIENTIFIC CONTRIBUTIONS OF THIS THESIS 5

• detailed study of the outcomes of the ImageCLEFmed evaluation activities, especially between 2011 and 2013 [194, 85, 119] as well as an assessment of the scholarly impact of ImageCLEF in previous years [236, 194];

• creation of an image database for evaluating image modality classification [76, 81].

Contributions to a medical case–based retrieval system include the following:

• a medical case–based retrieval approach implementation as well as a biomedical image modality classification approach [161, 80, 83, 164];

• an analysis of different fusion strategies to compare their performance [84, 86];

• a query–adaptive multi–modal fusion criterion implementation to decide when to use multi–modal (text and visual) or only text approaches in the retrieval step [77];

• modality classification approach implementation integrated into the medical case–

based retrieval; a semi–supervised learning technique is also proposed to exploit unlabelled data and to expand the training set [76, 81];

• a web–based retrieval interface, called Shangri–La.

Other articles written on this project include [162, 10, 11, 82, 72, 78, 73, 164, 9].

(21)

(22)

Chapter 2

Medical Visual Information Retrieval

“En la sociedad de la información radica la solución a la generación de la inteligencia colectiva que necesitamos para seguir adelante.”

Gaspar Ari˜no Ortiz

Medicine has been represented in images since prehistoric times with early illustrations leaning toward symbolic representations. Illustrations have been developing from symbol- ism to greater realism (see Figure 2.1). Advances in medical technologies have changed the physicians vision and understanding of the human body. Different modalities of medical images, such as radiology or microscopy, show objective evidence of disease and decrease the dependence on patient’s subjective descriptions. Figure 2.2 shows some examples of findings in medical images which help physicians in their work on patient cases.

Today, images are produced in hospitals in ever–increasing numbers [5] and provide crucial information for diagnosis, treatment planning and other tasks. A recent Euro- pean report estimates that 30% of the global digital storage is occupied by medical image

(a) Rock painting, 6000 B.C. Aboriginal “X–ray style” figure. Kakadu National Park, Northern Territory, Australia.

(b) The Ebers Papyrus, 1200 B.C.

Egyptian papyrus which describes a therapy for migraine.

(c) Copperplate engraving of a woman who died near the end of term by William Hunter, 1774. National Li- brary of Medicine.

(d) Drawing of Purk- inje cells and gran- ule cells from pigeon cerebellum by Santi- ago Ram´on y Cajal, 1899. Instituto Santi- ago Ram´on y Cajal.

Figure 2.1: Examples of historical medical illustrations.

7

(23)

8 CHAPTER 2. INFORMATION RETRIEVAL

(a) Findings on colour Doppler after endovascular treatment (stenting) in a 52-year-old woman suffering from re- current transient ischemic attacks.

(b) A complete healing at the polypectomy site on an endoscopy after a 12–week course of proton pump inhibitor therapy.

(c)Hematoxylin and eosin stain on the appendix tissue reveals villous adenoma with moderate to severe dysplasia located suppurative ap- pendicitis.

Figure 2.2: Examples of medical images that help in the diagnosis and treatment planning of cases.

data [3]. Besides clinical settings, images are also made available via biomedical publications. The number of biomedical articles published grew at a double–exponential pace between 1986 and 2006 according to [106]. For example, the biomedical open access literature of PubMed Central (PMC)⁴ alone contained almost 2 million images in 2014.

Many physicians have regular information needs during clinical work, teaching preparation and research activities [99, 179]. Therefore there is a need for searching through the immense collection of images in institutions and on the World Wide Web, making the data accessible for reuse. Studies showed that the time for answering a clinical information need using IR systems is around 30 minutes [101], while clinicians state to have approximately five minutes available [103]. Finding relevant information quicker is thus an important task to bring search into clinical routine [167].

Retrieval and classification of medical images have been explored to get additional information for reading and interpretation of medical cases [241] when open questions remain and thus help clinicians in their daily work.

Although text queries are commonly used, the visual information of the images can enrich the search. Images represent an important part of the content in many publications and searching for medical images has become common in retrieval applications, particularly for radiologists. Image retrieval has been shown to be complementary to text retrieval approaches and images can well help to represent the content of scientific articles, particularly in applications using small interfaces such as mobile phones [60]. Furthermore, medical case–based retrieval taking into account several images and potentially other data of the case has also been proposed by other authors over the past 7 years [186, 249].

2.1 Components of a retrieval system

IR systems search for relevant documents and information within the contents of a specific database. In this section, the components needed to develop a rudimentary IR system that retrieves documents are first described. Figure 2.3 puts together all the basic components to outline a complete IR system. The architecture of the IR system consists

4PubMed Central (PMC) is a free full–text archive of biomedical and life sciences journal literature at the U.S. National Institute of Health’s National Library of Medicine (NIH/NLM) (seehttp://www.ncbi.

nlm.nih.gov/pmc/).

(24)

2.1. COMPONENTS OF A RETRIEVAL SYSTEM 9

Figure 2.3: Outline of the basic elements of a complete retrieval system.

of the following three components:

1. Feature extraction – the system describes the query as a set of features to handle the index;

2. Indexing – the system builds an index of the document descriptors to record and maintain the database information;

3. Similarity calculator – the system retrieves documents that are relevant to thequery from the index and displays theretrieved data to the user.

This thesis focuses on the visual information integration of the medical retrieval systems. Therefore, this section presents an overview of text and visual information extraction and describes several methods to improve the retrieval precision using multi–modal approaches.

2.1.1 Information sources and retrieval

Text retrieval has been successfully used in various medical fields from lung disease through cardiology, eating disorders and diabetes to hepatitis [235] and Alzheimer’s disease [147].

Text in the anamnesis is often the first data available and based on the initial analysis other exams are ordered. Most biomedical search engines, also systems searching for images, have been based on text retrieval only. Sources of biomedical information can be scientific articles and also reports from the patient record [217]. The various parts of the text such as title, abstract and figure captions can then be indexed separately.

Some examples for general search tools that have also been used in the biomedical domain are the Lucene, Essie or Terrier IR tools. Lucene⁵ is an open source full–text search engine. The advantage of Lucene is its simplicity and high performance [166]. Essie [109]

is a phrase–based search engine with term and concept query expansion and probabilistic relevancy ranking. It was also designed to use terms from the Unified Medical Language

5The Apache Lucene is a project that develops open–source search software including indexing and search technology (seehttp://lucene.apache.org/).

(25)

System (UMLS). Terrier⁶is also an open source platform for research and experimentation in text retrieval developed at the University of Glasgow. It supports most state of the art retrieval models such as Dirichlet prior language models, Divergence from Randomness (DFR) models or Okapi BM25.

In addition to the text in the anamnesis, another initial data source for diagnosis are the images [249]. Users of biomedical sources are often interested in images for biomedical research or medical practice [193], as the images carry an important part of the information in articles. Rather than using text queries, in Content–Based Image Retrieval (CBIR) systems, visual features are extracted from the images and, based on them, images are retrieved. This allows the use of visual information to find images in a database similar to examples given or with similar regions of interest.

Visual retrieval for medical applications has also become an important research area over the past 15 years [213]. The most commonly used features for visual retrieval can be grouped into the following types [12, 107]:

• Colour – several colour image descriptors have been proposed [34] such as sim- ple colour histograms, a colour extension to the Scale Invariant Feature Transform (SIFT) [242] or the Bag of Colours (BoC) [82];

• Texture – texture features have been used to study the spatial organization of pixel values of an image like first order statistics, second order statistics, higher order statistics and multiresolution techniques such as wavelet transform [212];

• Shape – various features have been used to describe shape information, including moments, curvature or spectral features [257].

(a)In the right, regions detected by a key–region detector from the image in the left.

(b)The arrows, in the image in the left,represent the centre, scale and orientation of the key points detected in the image in the right, by the SIFT algorithm.

Figure 2.4: Information can be extracted from the visual content of the images.

Figure 2.4 shows examples of the visual information that can be extracted from the images.

The extraction of multiple visual features often enhance the retrieval performance. Multi- ple features have been explored, most frequently SIFT variants [37, 80, 52, 252, 219], Local Binary Patterns (LBP) [37, 219], edge and colour histograms [57, 37, 252, 219, 223, 39]

6Terrier is an open source search engine, readily deployable on large–scale collections of documents (see http://terrier.org/).

(26)

and grey value histograms [252]. Several texture features have also been explored such as Tamura [37, 252, 219, 223], Gabor filters [252, 219, 223], Curvelets [37], a granulo- metric distribution function [39] and spatial size distribution [39]. In recent years, visual words [234] have become the main way of describing images with a variety of basic features such as SIFT [159] and also texture or colour measures.

2.1.2 Information fusion

The combination of various single search modalities (such as text and visual image features) makes it possible to use cross–modal relationships and thus improve the performance beyond the performance of single components [254]. However, the improvement of the performance of these multi–modal systems has long been considered difficult due to the richness of multimedia [95, 141] and the complexity of extracting meaningful information from visual documents in a large domain automatically [197]. Fusing the retrieval results of visual and textual resources into a final ranking is a popular approach for multi–

modal retrieval. Fusion can either be performed early or late, creating a unified data representation or fusing after each data type is analysed independently [61, 65].

Several fusion models are described in the literature to combine multi–modal sources.

Already in 1998, La Cascia et al. [148] presented a CBIR system which combined visual and textual information directly in the feature vector space representation. Textual information is extracted using latent semantic indexing. In addition, visual information is captured in color and orientation histograms. More recently, Pham et al. [192] combine text and visual features by normalizing and concatenating them to generate the feature vectors. Traditionally, the most common method followed for data fusion is to search the modalities separately and fuse their results (ranked lists) with methods such as linear combination [8]. Methods to obtain suitable weights for linear combinations are reviewed by Wu [253]. Furthermore, Kludas et al. [140], Atrey et al. [12] and Depeursinge [61]

provide an overview of the different fusion methods that have been used for multimedia analysis and IR.

In terms of medical cases, images are always associated with either text or struc- tured data and this can then be used in additional to the visual content analysis for retrieval. Most often text retrieval has much better performance than visual retrieval, describing the context in which the images were taken. Poorest performance of visual techniques are achieved when applied to databases with a wide spectrum of image modalities, anatomies and pathologies [196]. However, there is evidence that the combination or fusion of information from textual and visual sources can improve the overall retrieval quality [157, 87, 79].

The most common approach to get the final result is the result combination of visual and text retrieval. Cao et al. [38] represent the features from different modalities as a multi–dimensional matrix and incorporate these feature vectors using an extended Latent Semantic Analysis (LSA) model. Gkoufas et al. [88] increase the retrieval performance by applying linear methods to combine visual and textual sources of images.

Classical approaches such as the maximum combinations (combMAX), the sum combinations (combSUM) and the multiplication of the sum and the number of non-zero scores (combMNZ) are studied by Zhou et al. [259] showing that fusing visual and text runs outperforms single modality runs. Mour˜ao et al. [173] introduce a new fusion technique, Inverted Squared Rank (ISR), a variant of the Reciprocal rank fusion (RRF).

Furthermore, some reranking methods have also been explored [89, 105] for fusion vi-

(27)

sual and text information. However, strategies that reorder top–ranked documents limit the margin of improvement due to their use on a limited number of documents [187].

Mart´ınez Fern´andez et al. [165] reorder the results from the CBIR using text–based retrieval. Viswa [243] uses visual information to rerank text–based image retrieval. The relevance of the images is linked to their initial rank position to relax the assumption that the top–ranked images in the text–based results are equally relevant.

2.1.3 Query–adaptive multi–modal fusion

Section 2.1.2 investigates techniques to fuse visual and text information to improve the precision of the retrieval. However, fusion does not always lead to better results and can even decrease the performance of the retrieval [83, 220, 174]. Therefore, to combine multi–modal retrieval two fundamental aspects should be studied: when and how multiple retrieval models can be combined to obtain better performance than individual models [157]. How to fuse multi–modal systems has been explored by studying multiple fusion techniques. These methods are particularly suitable under different settings and are studied in detail in this thesis. When to fuse multiple retrieval models, such as text or visual retrieval models, is a complicated topic. Different models used in a fusion process can provide complementary or contradictory information [12]. Hence, applying a single standard retrieval method for all possible queries is inadequate [155]. Recently,adaptive query retrieval has been an emerging trend as a solution to this problem [62]. Adaptive query techniques aim to associate individual queries with specific retrieval strategies [135].

Kennedy [135] reviews the methods proposed for adapting retrieval strategies according to the intentions of the user. Several strategies have been proposed, such as the prediction of the quality of each available tool based on statistical measures of the returned results or the adaptation strategies based on the user context. However, most of the techniques are based on query classification using Natural Language (NL) analysis of the query.

NL analysis is used in IR to translate potentially ambiguous NL queries and documents into unambiguous internal representations for retrieval [158]. Text retrieval techniques commonly use terminologies for query expansion [55, 215]. The queries can be expanded automatically with synonyms from such a terminology, for example. D´ıaz Galiano et al. [64] consider terms associated with Medical Subject Headings (MeSH) descriptors as synonyms and use these to expand queries. More recently Dram´e et al. [66] explore the use of term synonyms to expand queries. However, visual retrieval techniques cannot apply these methods directly for synonym extraction because visual information cannot be directly represented as words. Nevertheless, language modelling techniques can be extended easily to visual techniques [71].

In order to efficiently use multi–modal retrieval systems some efforts have been made to find a relation between images and text. Recently, Simpson et al. [218] review the techniques applied to deal with image content and its semantic meaning in terms of NL. A method based on global feature mapping is also presented. Kurtz et al. [145, 146] propose annotating the images with semantic terms extracted from a given ontology to build a vector of terms representing the image. Lacoste et al. [149] represent the images and the text in the same way, as vectors of concepts, building a conceptual index. However, most of the approaches use joint probabilistic models to find relationships between multi–modal features [153, 16, 68, 171, 202, 22]. Additionally, some approaches are based on image region categorization [58, 150].

(28)

(a)Ultrasound. (b) Electron microscopy.

(c)Positron Emission Tomography (PET). (d) Light microscopy.

Figure 2.5: Examples of images of various modalities that can be found in the biomedical literature.

2.1.4 Modality classification

Finally, it is also possible to use image analysis and classification to extract relevant information from the images (such as modality types, anatomic regions or the recognition of specific objects in the images such as arrows) to filter results lists or rerank them.

In the biomedical literature images can be of several types, some of which correspond to medical imaging modalities such as ultrasound, Magnetic Resonance Imaging (MRI), X–ray and Computer Tomography (CT) (see examples of images from various modalities in Figure 2.5). In user–studies [163], clinicians have indicated that modality is one of the most important filters that they would like to be able to limit their search by [56].

Previous studies [123, 56] have shown that imaging modality is an important piece of information relating to the image for medical retrieval. Image categories can be integrated into any retrieval system to enhance or filter its results [233], benefiting both in speed and precision of the search [120] by reducing the search space to a set of relevant categories [199, 83]. Furthermore, classification methods can be used to offer adaptive search methods [247, 20]. Automatic modality classification is thus an important part of the performance and usability of modern medical retrieval systems. However, image modality is typically extracted from the caption. Caption information can help if captions are well controlled like in the radiology domain but the more general biomedical literature makes it hard to find the modality information in the caption. Studies have shown that the modality can be extracted from the image itself using visual features [191, 151, 114]. Vi- sual image classification techniques have other shortcomings as some modalities can easily be mixed up when categorising automatically such as CT and MRI. In these cases text

(29)

information of the captions can be used as additional cues to disambiguate the two.

A big variety of visual classification techniques have been explored. Csurka et al. [57]

use a Fisher Vector representation of the images built on low level features. Kitanovski et al. [139] use a spatial pyramid in combination with dense sampling using an oppo- nentSIFT descriptor for each image patch. Support Vector Machine (SVM) withχ² kernel is then used as a classifier. Classifiers employed range from simplek–Nearest Neighbours (k–NN) [161, 70, 80, 83, 260] or logistic regression model [39] to Genetic Programming (GP) [70] or SVM [216, 37, 252, 219, 223, 139, 220, 260, 226, 20].

An overall system which uses the predicted modality within a retrieval system consists of the following steps: the modality is extracted from the query; the usual retrieval step is performed; the predicted modalities of the document are integrated into the search.

Information about image types can be used in various ways in the retrieval. The following approaches have been explored to integrate the classification into the results [233]:

• Filtering – discarding the images of which the predicted type is different to the query. Thus, when filtering using the image type only potentially relevant results are considered;

• Reranking – reranking the initial results with the image type information. The goal is to improve the retrieval ranking by moving relevant documents towards the top of the list based on the categorization;

• Score fusion – fusing a preliminary retrieval score SR with an image classification scoreS_M using a weighted sum: α·S_T+(1−α)·S_M, whereS_RandS_T are normalised.

This approach allows for adjusting the parameter αto emphasise the retrieval score or the categorization results.

Sometimes, the training set contains labelled data that are rare and some classes are under–represented. This scenario is often met in medical image analysis, where accurate labelling of big datasets is difficult and expensive to obtain. Therefore training data can be augmented with additional examples to improve the classification, which has also been explored in [37, 80, 223]. Semi–supervised learning [41] uses a small number of labelled instances and a large amount of unlabelled data for training the classifier. Methods of semi–supervised learning have been applied to handwritten text recognition [36] and biological networks [255]. Related to this work, in [57] semi–supervised classification is applied to medical image classification to expand the training set. The confidence scores for the unlabelled data are given by SVM classifiers using multi–modal (visual and textual) information. Moreover, the expansion of the training set by visual retrieval is explored.

2.2 Example systems

Due to the many challenges in biomedical retrieval, research has been attracting increasing attention, and many approaches have been proposed [157]. This section presents a few retrieval systems that use multi–modal information for the search. A more detailed overview on platforms specialised on biomedical search can be found in Gottlieb et al. [92].

Well–known free retrieval systems such as ARRS Goldminer⁷ or Yottalook⁸ retrieve

7ARRS GoldMiner provides rapid access to published, peer–reviewed medical images (see http://

goldminer.arrs.org/).

8Yottalook is a free medical imaging search engine that provides decision support at the point of care (seehttp://www.yottalook.com/).

(30)

2.3. RETRIEVAL EVALUATION ACTIVITIES 15

images and articles from peer–reviewed biomedical journals but only based on text queries.

On the other hand, systems such as Image Retrieval in Medical Applications (IRMA)⁹ or img(Anaktisi)¹⁰provided only CBIR. Regarding multi–modal retrieval systems, the Center of Informatics and Information Technology group CITI presented the NovaMedSearch¹¹ as a medical multi–modal search engine that can retrieve either similar images or related medical cases [172]. The National Library of Medicine (NLM)¹² provides Open–i¹³ [59], a service to search and retrieve abstracts and images from the open source literature and biomedical collections.

Furthermore, as described in Section 2.1.4, to improve retrieval quality a successful classification of images into types (e.g. X–ray, ultrasound, CT, etc) can be applied to filter out irrelevant images [199]. Already many web–accessible search systems such as Goldminer or Yottalook allow users to limit the search results to a particular modality [180]

as this is a feature often requested by end users [163]. However, they extract the modality information only from the text and not from the visual features of the images.

2.3 Retrieval evaluation activities

Systematic and quantitative evaluation activities using shared tasks on shared resources have been instrumental in contributing to the success of IR as a research field and as an application area in the past few decades. Evaluation campaigns have enabled the reproducible and comparative evaluation of new approaches, algorithms, theories, and models, through the use of standardised resources and common evaluation methodologies within regular and systematic evaluation cycles.

2.3.1 History

In 1955, a criterion of relevance and measures for the evaluation of text IR systems was proposed for the first time by Kent et al. [136].

In the 1960s, the Cranfield tests [45] were pioneering evaluating text retrieval technology comparing the effectiveness of the different indexing techniques. Many other research groups reused the Cranfield test collection for evaluating their systems [245]. The Cran- field studies set the importance of creating test collections and using these for comparative evaluation of IR systems. After these first benchmarks, several large–scale evaluation campaigns have been established at the international level, with major initiatives in the field of textl IR [210].

Starting also in the 1960s and through the 1990s, the SMART IR project at Cornell University investigates the effectiveness and efficiency of automatic text retrieval methods [31, 29]. This project emphasises completely automatic approaches to retrieve large

9Image Retrieval in Medical Applications (IRMA) is a project at the Aachen University of Technology (RWTH Aachen) that aims to develop and implement high–level methods for CBIR with prototypical application for medical tasks on a radiologic image archive (seehttp://ganymed.imib.rwth-aachen.de/

irma/).

10img(Anaktisi) is a web CBIR application that provides retrieval services for various image databases (seehttp://orpheus.ee.duth.gr/anaktisi/).

11NovaMedSearch is a multi–modal (text and image) medical search engine designed to find relevant medical images or cases on the Open Access Subset of PMC (seehttp://medical.novasearch.org/).

12The National Library of Medicine (NLM) maintains and makes available a vast print collection and produces electronic information resources on a wide range of topics (seehttp://nlm.nih.gov/).

13Open–i is an open access biomedical search engine (seehttp://openi.nlm.nih.gov/).

(31)

quantities of text. It offers a basic framework for research on the vector space and related models of IR [32].

In the 1990s, the Text REtrieval Conference (TREC)¹⁴started a consolidation to allow comparing results across the same data using the same evaluation methods [245]. TREC has provided large collections and uniform scoring procedures over the years [96]. TREC developed a research tool for evaluating retrieval methods: trec eval [33]. This tool has become the primary method used in research for retrieval evaluation to calculate the same measures using the same implementation.

Since 1999, the NII Testbeds and Community for Information access Research (NT- CIR)¹⁵ placed emphasis on IR with Japanese or other Asian languages and cross–lingual IR [131, 124, 125, 126, 127, 128, 129, 130, 207, 132]. NTCIR aims to advance in information access technologies including IR shifting from document retrieval to IR using information in the documents. NTCIR has also investigated evaluation methods for information access developing the tool NTCIREVAL [206].

Since 2000, the Conference and Labs of the Evaluation Forum (CLEF)¹⁶have organised a series of evaluation labs designed to bring different aspects of mono– and cross–language IR systems following TREC–style [26]. CLEF have support the development of an evaluation framework for IR systems operating in both monolingual and cross–language contexts including the creation of reusable data for benchmarking purposes.

In 2002, the INitiative for the Evaluation of XML retrieval (INEX)¹⁷organised the first workshop. The main goal of INEX has been to promote the evaluation of structural information (XML elements) to yield focused retrieval and identify relevant parts of relevant documents [75]. In 2013 and 2014, INEX run as a lab of CLEF.

Following the success of these evaluation campaigns, in 2008, the Forum for Informa- tion Retrieval Evaluation (FIRE)¹⁸ proposed a retrieval benchmark to deal with South Asian languages.

Similar evaluation exercises have also been carried out in the field ofvisual IR. In the 2000s, the Benchathlon¹⁹initiative tried to set up a common framework for the evaluation of CBIR systems. Unfortunately this initiative did not organised an evaluation campaign.

In 2001, TREC Video Retrieval Evaluation (TRECVid)²⁰ organised a track as part of TREC. TRECVid has encouraged video IR [221] and it became an independent benchmarking initiative.

14The Text REtrieval Conference (TREC) aims to support research within the IR community by providing the infrastructure necessary for large–scale evaluation of text retrieval methodologies (see http://trec.nist.gov/).

15The NII Testbeds and Community for Information access Research (NTCIR) is an evaluation forum which aims at promoting research in information access technologies (seehttp://research.nii.ac.jp/

ntcir/index-en.html).

16The Conference and Labs of the Evaluation Forum (CLEF) is a self–organised body whose main mission is to promote research, innovation and development of information access systems with an emphasis on multi–lingual and multi–modal information (seehttp://www.clef-initiative.eu/).

17The INitiative for the Evaluation of XML retrieval (INEX) provides an IR test collection in order to measure the performance of a search engine (seehttps://inex.mmci.uni-saarland.de/).

18The Forum for Information Retrieval Evaluation (FIRE) aims to encourage research in South Asian language information access technologies (seehttp://www.isical.ac.in/~fire/).

19Benchathlon aimed to set up a favourable environment for sharing CBIR resources (see http://www.

benchathlon.net/).

20The TREC Video Retrieval Evaluation (TRECVid) evaluation meetings are an on–going series of workshops focusing on a list of different IR research areas in content–based retrieval and exploitation of digital video (seehttp://trecvid.nist.gov/).

(32)

2.3. RETRIEVAL EVALUATION ACTIVITIES 17

Similarly, MediaEval²¹started in 2008 as a lab of the CLEF campaign, VideoCLEF [152].

MediaEval became an independent benchmarking initiative in 2010. It has focused on the social and human aspects of multimedia access and retrieval.

ImageCLEF was offered for the first time in 2003 as one of the CLEF labs including media data such as images. This lab has aimed to compare CBIR systems and to determine how associated cross–language text can be used in combination with CBIR, which is language independent, to improve retrieval performance [119].

In the biomedical field, retrieving large amounts of data is an important issue in the clinical routine. In the 1990s, OHSUMED²² provided a clinically–oriented MEDLINE subset covering all references from 270 medical journals over a five–year period (1987–

1991). The references include the title, abstract, MeSH indexing terms, author, source, and publication type. Moreover, novice physicians generated 106 queries [98]. However, OHSUMED did not provide standardised evaluation measures.

More recently, in 2011 and 2012, TREC organised the Medical Records track. This track examined the problem of retrieving relevant clinical reports from free–text fields [246].

Moreover, in 2014, TREC proposed the Clinical Decision Support track to retrieve biomedical articles relevant for answering generic clinical questions about medical records.

Although the medical information usually contains masses of free text [143] it also contains images. In 2004, the ImageCLEF lab introduced a medical task: ImageCLEFmed [50].

The tasks organised over the years by ImageCLEFmed have provided an evaluation forum and framework for evaluating the state of the art in biomedical image retrieval. This thesis focuses on the campaigns from 2011 to 2013 when the provided repositories had evolved to be close to real world in theirs size and scope [119]. Chapter 4 gives a detailed description of the evolution of ImageCLEFmed over the years.

Following the interest created by ImageCLEFmed, the Visual Concept Extraction Challenge in Radiology (Visceral)²³ is organizing a retrieval benchmark to find cases with similar anomalies based on large–scale sets of 3D radiology images in 2015 [117].

These evaluation campaigns have been widely credited with contributing tremendously to the advancement of IR by providing access to infrastructure and evaluation resources that support researchers in the development of new approaches, and encouraging collaboration and interaction between researchers from both academia and industry [119].

2.3.2 Evaluation process

A typical evaluation cycle is depicted in Figure 2.6. Each evaluation activity can have a different cycle time, e.g., the CLEF cycle operates over one year although some other evaluation campaigns operate over a longer period [48], such as NTCIR which operates over 18 months.

This section gives an overview of each step of the cycle described in the Figure 2.6.

21MediaEval is a benchmarking initiative dedicated to evaluating new algorithms for multimedia access and retrieval (seehttp://www.multimediaeval.org/).

22OHSUMED is test collection proposed for research (seehttp://ir.ohsu.edu/ohsumed/ohsumed.html).

23The Visual Concept Extraction Challenge in Radiology (Visceral) is a project supported by the Eu- ropean Commission under the Information and Communication Technologies (ICT) theme of the FP7 for research and technological development (seehttp://www.visceral.eu/).

Use case oriented medical visual information retrieval &amp; system evaluation

Thesis

Reference

Use case oriented medical visual information retrieval & system evaluation

Use Case Oriented Medical Visual Information Retrieval & System

Evaluation

TH` ESE

Alba Garc´ıa Seco de Herrera

Contents

Abstract

R´ esum´ e

Resumen

Acknowledgements

Chapter 1

Introduction

1.1 Motivations

1.2 Thesis overview

1.3 Scientific contributions of this thesis

Chapter 2

Medical Visual Information Retrieval

2.1 Components of a retrieval system

2.2 Example systems

2.3 Retrieval evaluation activities

Use case oriented medical visual information retrieval & system evaluation