Quantitative analysis of medical images: finding relevant regions-of-interest for medical decision support

(1)

Thesis

Reference

Quantitative analysis of medical images: finding relevant regions-of-interest for medical decision support

JIMENEZ DEL TORO, Oscar

Abstract

In the past decades the number of medical images inspected daily in health centers, as well as the complexity of imaging parameters have increased tremendously. An efficient quantitative analysis could improve health care by enabling a more objective interpretation of these imaging studies. The main goal of this thesis was to propose and evaluate novel methods that detect and quantify regions-of-interest (ROIs) in medical images. Challenges in medical image annotation and medical case-based retrieval were organized within a research group (VISCERAL) and are reviewed as a scientific contribution of this work. Moreover, multimodal (using both text and visual data) medical case-based retrieval systems are proposed both for radiology and digital pathology data, tackling the navigation of large-scale hospital repositories. By segmenting anatomical structures in full patient scans and measuring visual features in preselected regions, medical professionals can then prioritize their attention to the more significant structures in the images.

JIMENEZ DEL TORO, Oscar. Quantitative analysis of medical images: finding relevant regions-of-interest for medical decision support. Thèse de doctorat : Univ. Genève, 2017, no. Sc. 5106

URN : urn:nbn:ch:unige-962970

DOI : 10.13097/archive-ouverte/unige:96297

Available at:

http://archive-ouverte.unige.ch/unige:96297

Disclaimer: layout of this document may differ from the published version.

1 / 1

(2)

UNIVERSIT ´E DE GEN `EVE

D´epartement d’informatique FACULT ´E DES SCIENCES

Professeur Dr. Stéphane Marchand–Maillet Département de radiologie et informatique médicale FACULT É DE M ÉDECINE Professeur Dr. Henning Müller

Quantitative Analysis of Medical Images:

Finding Relevant Regions–of–Interest for Medical Decision Support

TH ` ESE

présentée à la Faculté des sciences de l’Université de Genève pour obtenir le grade de Docteur ès sciences, mention informatique

par

Oscar Jimenez-del-Toro

de

Guadalajara (M´exico)

Th`ese N^o 5106

GEN`EVE 2017

(3)

médicale), Monsieur S. MARCHAND-MAILLET, professeur associé et directeur de thèse (Département d’informatique) et Monsieur A. HANBURY, senior researcher (TU Wien, Wien, Austria) autorise l’impression de la présente thèse, sans exprimer d’opinion sur les propositions qui y sont énoncées.

Gen`eve, le 26 juillet 2017

Th`ese - 5106 -

Le D´ecanat

(4)

Abstract

Medical imaging techniques are able to show internal structures of the human body that would otherwise remain hidden without invasive procedures i.e. surgery. Medical professionals rely on these images to detect and characterise illnesses that could be present in patient studies. Doctors learn to establish correlations between visual image patterns and the structural organisation of human anatomy in the search for subtle variations that could indicate the presence of disease. However, in the past decades the number of medical images inspected daily in health centers, as well as the complexity of imaging parameters have increased tremendously. The time to interpret these imaging studies as part of their daily workload is limited, leading occasionally to a failed detection in difficult cases and errors in the interpretation. An efficient quantitative analysis could improve health care by enabling a more objective interpretation of these studies.

Clinical decision support and computer–aided diagnosis systems have been proposed to generate objective and repeatable measurements that could increase the reliability of medical images assessments. Some of these semi– and fully automatic methods have al- ready been included in the clinical workflow, reducing errors and inter–operator variability.

A fundamental step in the development of these systems is to initially locate anatomical structures and relevant areas where a more thorough image analysis should be performed.

The main goal of this thesis is to propose and evaluate novel methods that detect and quantify regions–of–interest (ROIs) in medical images. By segmenting anatomical structures in full patient scans and measuring visual features in preselected regions, medical professionals can then prioritise their attention to the more significant structures in the images.

A large data set that comprehends the natural anatomical variability and complexity of medical data is then required to train and test medical algorithms designed for real clinical scenarios. Biomedical image analysis challenges provide large data sets to participants and promote an objective comparison of medical algorithms with shared goals. As an outcome of these challenges, the advantages and limitations of diverse strategies are identified, thus increasing the understanding of the visual descriptors associated to anatomical structures and diseases. Challenges in medical image annotation and medical case–based retrieval were organised within a research group (VISCERAL) and are reviewed as a scientific contribution of this work. A cloud–based evaluation framework where various segmentation algorithms can be benchmarked is presented, as well as the comparison of their results to the visual interpretation of radiologists.

In this thesis, the different steps implemented in the detection and annotation of ROIs are discussed along with their corresponding set of measurements from handcrafted and deep learning features. Particularly, a hierarchic anatomical structure segmentation method was developed aiming to exploit the individual anatomical variations present in limited data sets of manually annotated structures. The method obtained promising

v

(9)

results in the VISCERAL Anatomy benchmarks for the segmentation of large and high–

contrasted anatomical structures and showed a notable advantage in the segmentation of smaller and harder to detect structures e.g. gallbladder and adrenal glands. Moreover, multimodal (using both text and visual data) medical case–based retrieval systems are proposed both for radiology and digital pathology data. These tools were designed to navigate large scale hospital repositories including medical reports and 3D patient scans, retrieving relevant information from automatically selected regions–of–interest. The correct selection of visual features from the computed ROIs could benefit clinical decision support systems with more consistent evaluations of the medical images and a better comprehension of the underlying pathological processes.

(10)

R´ esum´ e

Les techniques d’imagerie médicale sont capables de montrer les structures internes du corps humain qui resteraient autrement cachées sans procédures invasives telles que la chirurgie. Les professionnels de la santé comptent sur elles pour détecter et caractériser les maladies qui peuvent être présentes dans des études des patients. Ils inspectent visuelle- ment les images à la recherche de variations subtiles de taille, de forme ou d’apparence qui pourraient indiquer la présence d’altérations dans les structures anatomiques. Au cours des dernières décennies, le nombre d’images médicales inspectées quotidiennement dans des centres de santé, ainsi que la complexité des paramètres d’images ont considérablement augmenté. Étant donné leur charge de travail quotidienne, le temps pour interpréter ces

études d’imagerie est limité, ce qui mène parfois à une détection erronée dans les cas difficiles et à des erreurs d’interprétation. En outre, l’évaluation des images médicales est souvent basée sur des critères subjectifs qui peuvent dépendre de la formation et de l’expérience professionnelle de l’interprète. Une analyse quantitative efficace pourrait améliorer les soins de santé en permettant une interprétation plus objective de ces études.

Les systèmes d’aide à la décision clinique et de diagnostic assisté par ordinateur génèrent des mesures objectives et reproductibles qui peuvent être utilisées pour accroˆıtre la fiabilité des évaluations récurrentes des images médicales. Certains de ces algorithmes semiautomatiques et entièrement automatiques ont déjà été inclus dans le workflow clinique, réduisant les erreurs et la variabilité inter–opérateur. Une étape fondamentale dans le développement de ces systèmes consiste à localiser d’abord les zones pertinentes et les structures anatomiques où une analyse d’image plus approfondie devrait être effectuée.

L’objectif principal de cette thèse est de proposer et d’évaluer de nouvelles méthodes qui détectent et quantifient des régions d’intérêt (ROI pour son acronyme en anglais), afin de mettre en évidence les altérations ou simplement servir de base pour l’analyse sémantique d’images. La mesure des caractéristiques visuelles dans les ROI peut attirer l’attention des professionnels de la santé sur les régions qui pourraient influencer l’interprétation de l’étude complète. Les professionnels de la santé peuvent alors prioriser l’évaluation des données filtrées pour effectuer une inspection plus approfondie des régions et des structures détectées.

Les professionnels de la médecine apprennent à établir une corrélation entre les motifs d’images visuelles et l’organisation structurelle de l’anatomie humaine au cours de leur formation médicale. Les images sont interprétées par des radiologues dans un processus qualitatif basé sur des descripteurs visuels tels que la couleur, la forme ou l’apparence. Les caractéristiques d’images basées sur des données, comme celles tirées de la texture et celles obtenues à partir de réseaux de neurones convolutifs peuvent être utilisées pour définir et quantifier des ROI annotées manuellement par des experts en santé ou sélectionnées au- tomatiquement par des algorithmes d’imagerie médicale. Les étapes impliquées dans la détection et l’annotation des ROI sont discutées, ainsi que les caractéristiques manuelles

vii

(11)

et de deep learning correspondantes. Ces caractéristiques soulignent les anomalies as- sociées à certains processus pathologiques. Dans ce contexte, le recalage d’images est mis en évidence comme étant une technique de traitement d’image développée pour com- biner des données provenant de plusieurs patients, ce qui se traduit par des informations supplémentaires non apparentes dans les images séparées. En segmentant plusieurs structures anatomiques dans des scans complets de patients, une sélection initiale des structures principales peut cibler une analyse ultérieure et plus étendue des ROI.

Un vaste ensemble de données tenant compte de la variabilité naturelle et la complexité des données médicales est nécessaire pour entraˆıner et tester des algorithmes médicaux simulant un scénario clinique réel. Des compétitions liées à l’analyse d’images biomédicales fournissent á leurs participants des ensembles de données avec des annotations manuelles de professionnels de la santé et favorisent la comparaison d’algorithmes médicaux sur des tâches communes. Les avantages et les limites de diverses approches sont identifiées afin d’accrotre la compréhension des descripteurs visuels associés à la représentation de structures anatomiques et de maladies. Plusieurs compétitions en matière d’annotation d’images médicales, de recherche de cas médicaux et d’histopathologie ont été organisées et sont examinés dans le cadre de ce travail. Un framework d’évaluation en ligne qui évalue la qualité de différents algorithmes de segmentation est décrit, ainsi que la relation entre les métriques de segmentation et l’interprétation visuelle des radiologues.

Les segmentations produites par ces algorithmes peuvent être fusionnées pour créer des ensembles de données plus volumineux comprenant des estimations spatiales de structures anatomiques. Cela pourrait s’avérer une ressource précieuse pour les chercheurs et les hôpitaux puisque l’annotation manuelle d’images médicales n’est pas évolutive et peut être sujette à erreur. La sélection de caractéristiques visuelles appropriées des ROI calculées pourrait faciliter la prise de décision clinique avec des évaluations plus cohérentes des images médicales et une meilleure compréhension des processus sous–jacents. En outre, les cas médicaux les plus pertinents peuvent être identifiés, permettant aux professionnels médicaux de consacrer plus de temps aux cas les plus difficiles à diagnostiquer ou à ceux avec une urgence médicale plus élevée.

(12)

Resumen

A través de técnicas de imagen médica es posible visualizar estructuras del cuerpo hu- mano que permancer´ıan ocultas sin procedimientos invasivos, como una cirug´ıa. Los profesionales médicos dependen de estas imágenes para detectar y caracterizar enfermedades que pudieran estar presentes en los estudios de los pacientes. Los doctores aprenden a establecer una correlación entre los patrones visuales en las imágenes y la organización estructural de la anatom´ıa humana en la búsqueda de variaciones sútiles que pudieran indicar la presencia de enfermedad. Sin embargo, en las últimas décadas, el número de imágenes médicas inspeccionadas diariamente en los centros de salud, as´ı como la complejidad de los parámetros en cada imagen se han incrementado enormemente. El tiempo para interpretar estos estudios de imagen como parte de la carga diaria de trabajo es limitado, resultando ocasionalmente en una detección fallida de casos dif´ıciles y errores en la interpretación. Un eficiente análisis cuantitativo podr´ıa mejorar el cuidado de la salud permitiendo una interpretación más objetiva de estos estudios.

Los sistemas de apoyo en las decisiones cl´ınicas y de diagnóstico asistido por computa- dora generan mediciones objetivas y repetibles que pueden ser usadas para incrementar la confiabilidad de las valoraciones hechas en imágenes médicas. Algunos de estos algoritmos, semi– o totalmente automáticos, ya han sido incluidos en los procesos de trabajo cl´ınicos, reduciendo errores y la variabilidad inter–operador. Un paso fundamental en el desarrollo de estos sistemas es localizar inicialmente estructuras anatómicas y áreas rele- vantes donde sea necesario un análisis más exhaustivo de la imagen. El principal objetivo de esta tesis es proponer y evaluar nuevos métodos que detecten y cuantifiquen regiones–

de–interés (ROIs) en imágenes médicas. Mediante la segmentación de varias estructuras anatómicas en escaneos completos de pacientes y midiendo caracter´ısticas visuales en regiones preseleccionadas, los profesionales médicos pueden entonces priorizar su atención a las estructuras más significativas de las imágenes.

Un extenso conjunto de datos que incorpore la variabilidad natural y complejidad de los datos médicos es entonces necesario para entrenar y probar los algoritmos médicos diseñados para escenarios cl´ınico reales. Las competencias de análisis de imágenes biomédi- cas proporcionan conjuntos de datos a los participantes y promueven una comparación objetiva de algoritmos médicos con metas afines. Como consequencia, las ventajas y limitaciones de diversas estrategias son identificadas, incrementando el entendimiento de los descriptores visuales asociados a la representación de estructuras anatómicas y enfermedades. Competencias en la anotación de imágenes médicas y recuperación de casos médicos fueron organizadas junto a un grupo de investigación (VISCERAL) y son revisadas como una contribución cient´ıfica de este trabajo. Una estructura de evaluación basada en la nube, donde varios algoritmos de segmentación pueden ser analizados es presentada, as´ı como la comparación de sus resultados a la interpretación visual de radiólogos.

En esta tesis, los diferentes pasos implementados en la detecci´on y anotaci´on de ROIs ix

(13)

son discutidos junto con los correspondientes conjuntos de caracter´ısticas diseñadas ma- nualmente y de aprendizaje profundo. Particularmente, un método de segmentatición jerárquica de estructuras anatómicas fue desarrollado con la intención de explotar las variaciones anatómicas individuales presentes en conjuntos reducidos de estructuras ma- nualmente anotadas. El método obtuvo prometedores resultados en las competencias

‘VISCERAL Anatomy’ para la segmentación de estructuras anatómicas grandes y con alto–contraste y mostró una notable ventaja en la segmentación de estructuras pequeñas y dif´ıciles de detectar, por ejemplo, la ves´ıcula biliar y glándulas adrenales. Además, sistemas multimodales (usando tanto texto como información visual) de recuperación de casos médicos son propuestos tanto para datos radiólogicos y de patolog´ıa digital. Es- tas herramientas fueron diseñadas para navegar extensos repositorios médicos, incluyendo reportes médicos y escaneos de pacientes en 3D, recuperando información relevante de regiones–de–interés seleccionadas automáticamente. La selección correcta de caracter´ısticas visuales de las ROIs computadas podr´ıa benficiar sistemas de soporte cl´ınico con evalua- ciones más consistentes de las imágenes médicas y una mejor comprensión de los procesos patólogicos esenciales.

(14)

Acknowledgments

I would like to thank everyone who participated and contributed to the development of this thesis and this experience in my life as a PhD candidate. Foremost, I express sincere gratitude to Professors Stéphane Marchand–Maillet and Henning Müller for their guidance during this period, generous sharing of their knowledge and constant support throughout my PhD. Particularly, I would like to thank the opportunity Prof. Henning Müller has given me to be part of his research group, in which I have come to love and enjoy working everyday during the past 5 years. I would like to spend my professional career being an example of the hard work that is promoted in this group always looking for new challenges and original breakthroughs. A special mention to PhD Allan Hanbury, with whom I share many scientific publications, I thank him for accepting to be my thesis expert advisor and for the work shared during this period.

I wholeheartedly thank my colleagues and teammates from the MedGIFT group and from the VISCERAL and DESUTO projects, with whom I shared deadlines, success stories and funny moments as part of the perks from the daily workload. Namely the following people in Sierre: Adrien, Alba, Andres, Antonio, Dimitris, Geovanny, Ivan, Manfredo, Matteo, Pol, Ranveer, Roger, Sebastian, Stefano, Visara and Yashin. Thank you all for your cookies, warm conversations and smart quips during the coffee breaks. It has also been a wonderful experience to get to know hardworking individuals like Abdel, Bjoern, Georg, Katharina, Marianne, Markus, Mattias and Orcun in the VISCERAL project and Kristian, Lena and Martin, Mats and Peter from the DESUTO project.

Andre, Ale Mamá, Javier, Mafer and Pau for being a new family for me and helping us in this voyage together. Thanks to Lalo, Mamá and Papá for accompanying us since the beginning and being the most caring and loving persons I know. Thank you for devoting such positive characters into building a better version of myself every day. You have taken me farther than I would have ever imagined.

Finally, I would like to thank Ale. It has been your daily smile and our enchanting communication that have put us through the roughest patches, but mostly through the happiest moments of this adventure together. Thank you for sharing your life with me, revealing the brave and full of hope person you are.

xi

(15)

(16)

Chapter 1

Introduction

Medical imaging is an essential diagnostic technique used by health professionals for evaluating the medical condition from a patient. Multiple medical imaging modalities exist, with each one portraying a distinct signal from the human body. These signals can be physically measured and represented as a vector or a multi–dimensional matrix. The analysis of these signals can be associated to anatomical and pathological information from underlying structures [207]. In some cases, the generated images and volumes contain body part attributes that would only be visible if invasive procedures were performed instead, such as surgical intervention. In Fig. 1.1 sample images from some of the most commonly requested medical imaging techniques are shown. Moreover, some of these modalities (e.g. magnetic resonance imaging, MRI) can depict hidden functional processes that would otherwise remain unnoticed during visual inspection e.g. organ perfusion and cell metabolism. Medical professionals, including radiologists and pathologists, do not only confirm or rule out alterations from this data, but are also able to characterise the pathological processes causing a specific health condition [42].

1.1 Motivation

In the past decades, the amount of medical data available to assist in the decision–

making process, including medical imaging, has increased substantially, both in storage size and information load within the data [43]. Medical professionals daily inspect a large quantity of patient scans showing multiple anatomical structures and even full body systems. The health care costs related to medical imaging are rising as well [252]. For example, taking only mammography images into consideration, 1 billion imaging exams (4 images per exam) are performed worldwide per year, which results in an average of 120 medical image perception events per second [152].

The task of visually inspecting these studies can be time consuming and significantly labor intensive, particularly when large regions of multidimensional images are screened with multiple imaging parameters to optimise [5, 163]. Additionally, subjective measurements and personal experience may result in intra– and inter–operator variability when interpreting these studies [175]. In difficult or inconclusive cases these events are more frequent, consequently reducing the reliability of the diagnostic tests [224]. A miss rate as high as 30% and a false positive rate up to 15% has been estimated overall in radiology [151]. False positives might result from overlaying structures or insufficient experience from the interpreter. On the other hand, false negatives have been classified into three

1

(17)

A B C

D

E F

Figure 1.1: Different medical imaging modalities implemented for health care: (from left to right) A) X–ray, B) computed tomography, C) positron emission tomography, D) ultrasound, E) magnetic resonance imaging and F) digital pathology whole–slide image.

categories according to the amount of time invested in their inspection:

• Search errors: if the radiologic features from a pathological condition are never looked at,

• Recognition errors: if the radiologic features are looked at but not enough to be interpreted as a pathological condition,

• Decision errors: the features are looked at for long periods of time but they are not considered as a pathological condition.

Computer–aided diagnosis has been defined as “the use of computer algorithms to aid the image interpretation process” [70]. Besides helping in the detection process, it is also used in the computerized extraction of quantitative measurements from medical images [252]. CAD systems, as well as clinical decision support (CDS) systems, provide clinicians with filtered information that can enhance patient health care [199]. Further- more, the algorithms compromised in these systems are designed to detect, measure and diagnose imaging abnormalities in clinical studies [48]. One of their goals is to help medical doctors to perform objective and repeteable measurements [271, 143, 73]. By sparing medical professionals of the quantifiable procedures in their work, CAD and CDS systems can overcome arbitrary clinical reasoning [180]. They are particularly advantageous when large amounts of medical data should be reviewed, maximising both productivity and reliability. These systems have improved the diagnostic accuracy of radiologists by reducing human errors and simplifying some of the most time–consuming and tedious endeavours in medical image interpretation [57]. Through a faster and more consistent the image interpretation process, individual radiologists can read more cases efficiently, taking into consideration situation–specific data and system recommendations [44, 192]. Thus with the development of algorithms for CAD and CDS, researchers aim to mimic the role of radiologists and pathologists by quantifying image data and connecting clinical information to support the final assesment.

(18)

1.2. THESIS OBJECTIVES 3

1.2 Thesis objectives

A top priority for CAD and CDS algorithms is to reduce the amount of data that medical experts have to inspect, either by selecting the most pressing cases or by enhancing relevant areas in the images. In Fig. 1.2) three different cases are presented where the areas containing the most relevant information from the corresponding study have been highlighted. In the far left, image a cropped rendering from a 3D whole–body computed tomography (CT) scan is shown with the segmentation of the left adrenal gland (in red).

The image in the middle shows a lesion in the brain that has been manually delineated (in red) in the segmented brain from a MR image of the head. To the far left is a zoomed out view from a whole–slide image where three areas that were visually inspected by a pathologists are highlighted in this resolution (in red). These samples are shown to emphasize that in the majority of patient cases, the areas containing the most relevant information can be quite small in comparison to the full images. Moreover, the alterations might only be visible in one of multiple images composing the patient study.

The diagnostic process involves two basic tasks: visual perception and rendering an interpretation (cognition), as they are not self-explanatory [152]. When medical experts inspect medical images, they look for specific visual patterns that might reflect underlying changes in size, shape or appearance of the anatomical structures to confirm or discard a particular diagnosis. The images can vary considerably even if they were obtained from the same imaging modality and correspond to a similar pathology. Contrast, brightness, scale and image quality are some of the factors known to influence the interpretation and subsequent clinical decisions of visual interpreters [145]. A deficient search of the images can lead to missing lesions and false negative events [247]. Thus, an initial key step in this diagnostic process is deconstructing medical images into different structures and components, which can be referred to as image parsing [183]. The selected content–

based image features used for performing this task can be categorized at pixel, object and semantic level, based on their biological interpretability [96]. Sorting out raw medical image data into semantic elements can be useful for clinicians to efficiently read and navigate through the images with parsing tools [222]. Unlike carrying out a search using only–text information, such as keywords, medical images can also be searched and retrieved for their content with automatic parsing [223].

Therefore, an initial crucial step in medical image analysis systems is to detect and/or segment the anatomical structures present in the image, thus contributing to the visual perception task. Specific regions–of–interest can then prioritise the attention of the interpreters to the foremost structures in the images. How to objectively detect, segment and quantify manually annotated and automatically computed regions–of–interest (ROIs) in medical imaging is the main topic addressed in this thesis. Different approaches to generate and compare ROIs are evaluated and discussed.

1.3 Thesis overview

The work addressed in this thesis was mostly developed within two european research projects: the Visual Concept Extraction Challenge in Radiology (VISCERAL)¹ and the Self–learning decision support tool for fast and accurate cancer evaluation in digital pathol-

1http://www.visceral.eu/, as of 01 Jul. 2017

(19)

Figure 1.2: Samples of medical images with manually annotated regions-of-interest (ROIs).

Medical doctors annotate these areas because they contain relevant information for the clinical status of the patient. They visually inspect full images in 2D slices, sometimes from 3D volumes, to detect these ROIs.

ogy (SLDESUTO–BOX)². The VISCERAL project was supported by the European Com- mission under the Information and Communication Technologies (ICT) theme of the 7th Framework Programme for Research and Technological Development. The project defined and executed a targeted benchmark framework to speed up progress towards: 1) the automated anatomy and pathology identification in radiology (MRI and CT) volumes, and 2) similar retrieval of multimodal radiology data. The objectives, benchmarks and conclusions from the VISCERAL project will be explained more in detail in the following chapters of this thesis (e.g. Ch. 2.1). On the other hand, SLDESUTO–BOX received funding from the Eurostars–2 Joint Program with co–funding from the European Union’s Horizon 2020 research and innovation program. The SLDESUTO–BOX is a joint project aimed at developing a decision support toolbox that supports pathologists in their task of diagnosing and evaluating the prognosis of different types of cancer. Two main outcomes were delivered from this project: 1) an innovative decision support toolbox for digital pathology, and 2) knowledge development in the field of image analysis and technology integration for digital pathology.

Although there is a strong overlap between the objectives and outcomes of the mentioned projects and the body of publications from the author of this thesis, the main scientific contributions and analysis performed specifically by the author are highlighted.

In this sense, both projects are addressed within the context of the topics and objectives discussed in this thesis. The motivation and scientific contributions of this thesis are introduced in this first chapter together with the corresponding references of the work published by the author in this regard.

In particular, the clinical contribution of CAD and CDS systems is clear when they are objectively evaluated in large data sets of real medical data. A current trend in this field is the organisation of biomedical image analysis challenges where state–of–the–art methods can be tested and compared using manually annotated medical data. An overview of the evaluation framework and contributions from this thesis to these challenges is described in chapter 2 both in the radiology and digital pathology fields.

A specific track of ROI detection corresponds to the segmentation of anatomical struc-

2http://www.contextvision.com/sldesuto-box-a-eurostars-sponsored-project/, as of 01 Jul.

2017

(20)

1.4. SCIENTIFIC CONTRIBUTIONS 5

tures. Organ segmentation is an active topic in medical imaging with various approaches proposed in literature. A hierarchic anatomical segmentation strategy based on anatomical correlations is described in Chapter 3. Moreover, the comparison of results from this strategy in the VISCERAL Anatomy segmentation challenge are also shown. Finally in this chapter, and as part of the evaluation framework of the Anatomy benchmarks, the author contributed with a visual evaluation system, defined by radiologists, to which segmentation metrics were measured against.

Chapter 4 presents the proposed methods for detecting and quantifying the data from ROIs in medical images. The techniques and applications that were developed as part of this work are described. The common goal of these applications was to search for basic semantic elements in the imaging studies of the patients for medical decision support.

The experiments evaluating the contributions with their corresponding clinical–context are also mentioned in this chapter.

In chapter 5, the main topic discussed is the retrieval of medical images and cases using regions–of–interest. Case–based retrieval systems implemented both for radiology and digital pathology data were developed by the author of this thesis. In the case of the radiology multimodal retrieval system, both the data set and evaluation were performed within the VISCERAL Retrieval benchmark. The first multimodal case–based retrieval strategy for histopathology data is proposed in this chapter with whole–slide images and full pathology reports.

Finally in chapter 6, we conclude the thesis with a revision of the objectives and a set of perspectives resulting from the work performed.

1.4 Scientific contributions

Scientific contributions in the fields of ROI detection, segmentation and application for medical decision support are explained more in depth in later chapters. A diagram of the topics investigated according to the selected medical tasks is shown in Fig. 1.3.

The diagram shows four different tasks health professionals carry out when inspecting and interpreting a medical image that could be targeted with computer vision algorithms:

• Evaluate: The performance of state–of–the–art algorithms for medical image analysis tasks should be compared in a common data set to understand the advantages and limitations of each strategy. Moreover, these algorithms should also be set side by side to the interpretation provided by medical professionals to define the success of the proposed solutions (see Ch. 2).

• Detect: When interpreting medical images the first step is to select the areas in the image and regions–of–interest that might have more relevance in the medical assessment (see Ch. 3).

• Quantify: The correct features from the detected regions–of–interest can then be ex- tracted and quantified to have an objective measurement of the visual characteristics shown in this areas (see Ch. 4).

• Decide: A repeatable decision based on the appropriate measurements can help reduce the inter and intra–observer variability when interpreting medical images (see Ch. 5).

The main contributions from this thesis are highlighted in the following paragraphs:

(21)

Figure 1.3: Diagram of the topics discussed in this thesis.

1.4.1 Detecting and quantifying regions–of–interest in medical imaging for clinical decision support

• A semi–automatic method for the quantification of manually annotated ROIs that can objectively detect a texture abnormality in the brain [120].

• A texture analysis strategy for multiple organ structures, aiming to identify their appearance in a context–free approach [54, 119].

• An overview of various texture analysis methods proposed in literature, particularly tested in histopathology images [129].

• A fully automatic and weakly supervised cancer grading algorithm using deep learning for prostate whole–slide images without relying on manually annotated ROIs [115].

1.4.2 Segmenting anatomical regions–of–interest in medical imaging data

• Contributed as an organiser of the VISCERAL cloud–based evaluation framework and participated as co–author of various scientific publications regarding the data set, set up and outcome of this project [82, 266, 147, 80, 81, 148].

• A hierarchic multi–anatomical structure segmentation approach guided by spatial correlations [125, 131, 124, 123, 126, 118, 41, 128].

(22)

1.4. SCIENTIFIC CONTRIBUTIONS 7

Figure 1.4: Automatic segmentation of anatomical structures in medical images opens the door for semantic parsing.

• Main author from an overview of the VISCERAL Anatomy benchmarks. In this work an analysis was performed, by author of this thesis, participant algorithms and results of the Anatomy (1–3) organ segmentation and landmark detection tasks [121, 74, 128].

• Participated as co–author in scientific publications interested in the selection of the most suitable segmentation metrics for medical imaging. Contributed in the development of a visual evaluation system, from a clinical perspective, of output segmentations from anatomical structures [240, 241, 128].

1.4.3 Medical and histopathology case–based retrieval systems

• A strategy combining radiology imaging and clinical meta–data for multimodal medical case–based retrieval [116, 117].

• Main author and co–author of publications reviewing the results from the VIS- CERAL Retrieval benchmark. The author of this thesis summarised and discussed the data sets, set up, participant algorithm and results of the Retrieval benchmark hosted in the Multimodal Retrieval in the Medical Domain (MRMD) work- shop [186, 122, 187, 188, 127].

• A deep multimodal case–based retrieval for large histopathology datasets [130].

Other papers written during the projects involved in the creation of this thesis are [189, 220, 17, 249, 233]. These papers are related to the interpretation of health data obtained from wearable and mobile applications for pervasive health care platforms. Our contribution to these works was on the design of the applications and the interpretation of the obtained medical data.

(23)

(24)

Chapter 2

Data sets and challenges for

evaluating medical image analysis algorithms

State–of–the–art medical image analysis algorithms can objectively identify and measure pathological signs for robust and repeatable image quantification [271]. This could help medical doctors in interpreting a larger number of patient imaging studies and support their diagnostic hypothesis [143]. However, selecting the right algorithm or combination of algorithms that produce an optimal solution for a desired medical task is not always clear.

The available state–of–the–art approaches should be tested extensively and in different settings to define the advantages and limitations of each method. A frequent bottleneck in the evaluation of these approaches, is the lack of a common data set where various algorithms can be tested and compared [90, 158]. Solutions published in the literature with non–public data sets limit comparisons with other techniques and can end in poor baselines or ‘improvements that don’t add up’ [8]. Overall, this creates an ‘illusion of progress’ with a limited influence of the approaches to justify their implementation in real hospital scenarios [91].

A large data set that truly represents the natural variability and complexity of medical data is in general preferred for training and testing of algorithms. Building these large data sets for medical image analysis requires a ground truth derived from manual annotations performed by health professionals (e.g. radiologists) in large quantities [88]. These annotations are analysed during algorithm training with the aim to replicate learned expert judgments for unseen target images. Particularly for supervised or semi–supervised deep learning methods this step is crucial for the development of successful models. The number of attainable manual annotations is limited and does not scale to the magnitude of training samples needed to characterise distinct diseases [158]. Moreover, manual annotation, such as the localisation and segmentation of various structures in an image, can be an expensive and time–consuming labour for the radiologists, difficult to reproduce.

In recent years, there has been a widespread of online challenges with shared data sets in different fields of research including medical imaging³. These challenges have attracted both academic groups as well as companies⁴. Multiple participants have submitted algorithms for demanding tasks and had access to data sets from different imaging modalities

3MICCAI Grand Challenges,http://grand-challenge.org/All_Challenges/, as of 01 July 2017

4Kaggle,https://www.kaggle.com/, as of 01 July 2017

9

(25)

and diseases [181, 93, 184, 89]. Unlike the first evaluation campaigns, the data sets from the latest challenges are becoming closer to “realistic” content used in health care, both in size and variation found in real patients [87]. The risk of overfitting is reduced when the data is obtained from different scanners, different health centers and multiple patients to better represent a real clinical scenario.

As a basic component of this thesis, contributions were made in the organisation, participation and result analysis of challenges with the following topics:

• Detection and segmentation of anatomical structures

• Case–based medical image retrieval

• Anatomical pathology challenges

In the following sections the data sets and tasks from the challenges on which the contributions from this thesis were trained and evaluated are described.

2.1 VISCERAL project

Aiming to “bring the algorithms to the data” the VISCERAL project ⁵ was estab- lished as a cloud–based evaluation framework for medical image analysis algorithms [158].

The goal of the VISCERAL benchmarks is to create a single, large, and multi–purpose medical image data set and evaluation infrastructure. Through organized benchmarks, research groups can test their specific applications and compare them to other available solutions against the standard manual annotations. It has organized benchmarks in medical image annotation, lesion detection and medical case–based retrieval with a large Gold Corpus generated with manual annotations from radiologists. One of the main innovations from the VISCERAL framework, compared to previous evaluation campaigns, is that it was hosted in a cloud environment, namely the Microsoft Azure cloud ⁶. Both storage and computation power are virtually unlimited in a cloud platform hence the capability to create and manage virtual machines (VMs) for the participants as well as data storage containers. Furthermore, data from the challenges can be accessed and distributed through a cloud–based framework, in contrast to previous medical image challenges where both training and testing set were sent by post in hard drives or downloaded via online platforms [178]. A direct benefit from the VISCERAL approach is that large data sets (e.g. terabytes of data) can be publicly available to participants. Data privacy require- ments associated with sensitive medical data are no longer a limitation for their usage since the data are stored centrally and duplication of confidential material can be avoided.

Particularly in this project, the Azure cloud is HIPAA (Health Information Portability and Accountability Act) certified [6].

With central storage, data sets can also be made private with different access permis- sions, meaning participants can have access to only subsets of data. Challenge admin- istrators can run directly the participant algorithms, implemented in the provided VMs, on private test sets, not available to the participants themselves. This is advantageous to evaluation campaigns as intended or unintended bias is reduced if participants do not have access to the test set before their submission [170]. In recent years, some challenges have distributed their test set during live competitions, potentially impeding the participants

5http://www.visceral.eu/, as of 01 July 2017

6https://azure.microsoft.com/en-us/, as of 01 July 2017

(26)

2.1. VISCERAL PROJECT 11

from training their algorithms with test images [178]. This approach does not overcome the obstacle of objectively testing the algorithms in very large data sets since the evaluation time is limited.

The medical data and manual annotations created by expert radiologists were initially uploaded to a common cloud storage container. They were then divided into training and testing set in separate storage containers. Another storage container was reserved for the output files and evaluations of the participants algorithms. If additional images or annotations were generated during the benchmarks, they were added to the storage containers when required. Moreover, the participants algorithms were installed and executed in identical cloud computing instances and thus fully reproducible. Virtual machines (VMs) running on the Microsoft Azure cloud infrastructure were pre–configured with 5 operating systems, including Windows and Linux, with the same computing specifications and capabilities. Through an assigned virtual machine, each participant had access to the training data set and could install the necessary tools and libraries to run their algorithms. During the training phase, they could optimize their approaches. Additionally, a web management portal was used for the VISCERAL project to simplify the submission process and administrative tasks of handling the VMs. The VISCERAL registration and management system⁷ was created, containing all the information regarding the benchmarks (user agreement, benchmark specifications, data set information). Participants had access to the credentials for their VMs in the system’s participant dashboard. Up to the deadline of the benchmarks, the participants could easily submit their VMs for algorithm evaluation using this dashboard.

2.1.1 Medical image detection and segmentation: VISCERAL Anatomy benchmarks

All kinds of anatomical structures are visually analysed in medical images as part of the daily work from radiologists. Subtle variations in size, shape or appearance can be described as radiological signs that confirm or discard a particular diagnostic hypothesis. Radiologists screen through large regions in full imaging data to detect and interpret these findings. However, manual measurements and personal experience may result in intra– and inter–operator variability when interpreting medical data, particularly in difficult or inconclusive cases [175, 224]. An automatic detection of the main anatomical structures shown in the images can potentially lead to a reduction in the workload of radiologists. Furthermore, most biomedical image analysis methods require the detection and segmentation of anatomical structures as the basis for further image analysis [226, 274].

Automatic medical image segmentation of anatomical structures promotes a more efficient image interpretation by facilitating hierarchic parsing and semantic navigation in image sets [222].

Medical data annotation challenges organised in the past few years targeting anatomical structure segmentation and landmark detection are shown in Table 2.1. Nevertheless,

7http://visceral.eu:8080/register/Login.xhtml,*

8Automatic Cephalometric X-Ray Landmark Detection Challenge 2014,http://www-o.ntust.edu.tw/

~cweiwang/celph/,*

9Proceedings of SHAPE 2015 Symposium, http://www.shapesymposium.org/proceedings-screen.

pdf,*

10Pancreas Segmentation from 3D Abdominal CT images,http://www.biomedicalimaging.org/2014/

program/challenges/,*

* as of 01 July 2017

(27)

Table 2.1: Challenges in medical imaging related to anatomical structure segmentation segmentation and landmark detection.

Landmark detection

Structure Modality

Lung [191] CT

Head ⁸ X–ray

Anatomical structure segmentation

Structure Modality

Liver [98]⁹ CT

Pancreas¹⁰ CT

Airway path [170] CT

Lung vessels [217] CT

Lung nodules [251] CT

Brain anatomical structures [225] MRI

Brain tumor [178] MRI

Head and neck structures¹¹ MRI

Prostate and surrounding structures [169] MRI

Heart anatomy ¹² MRI

Heart motion tracking [20] US

Spine and vertebrae ¹³ X-ray

few challenges have addressed tasks with multi–structure segmentation [169], focusing on single organs instead [98]. Some of these challenges use only a portion of the full volumes from real patient scans thus simplifying the task of finding the desired structure in cropped regions (e.g. only abdomen, only thorax). The inherent spatial relations between the anatomical structures and their surroundings are consequently not evaluated, potentially missing helpful information for image interpretation [166, 196].

The VISCERAL Anatomy benchmark series evaluated the performance of state–of–

the–art anatomical structure segmentation algorithms in 3D medical imaging data [128].

A per–anatomy, per–modality evaluation was performed according to the nature of partic- ipating algorithms and the attempted image analysis tasks. Participants had the freedom to choose which of the structures their algorithm should segment. Anatomical landmarks, in selected anatomical structures, can be identified in different image sequences, which is why their location is an important first step in parsing image content. Three benchmarks (Anatomy 1–3) on automated anatomy localisation and segmentation of whole–body 3D volumes have been organised. These benchmarks are among the first challenges to evaluate multi–modal medical image analysis techniques using a large amount of data annotated by radiologists. The VISCERAL Gold Corpus consists of 391 CT and MRI data sets (889 sequences in total) with 20 different organs and 53 landmarks, with a subset of thirty volumes per modality (120 volumes in total) manually annotated by radiologists for the Anatomy benchmarks. The following anatomical structures were annotated if present in a medical case: (see Fig. 2.1) right lung (light blue), left lung (blue), left kidney (brown),

11Head and Neck Auto Segmentation Challenge,http://www.imagenglab.com/wiki/mediawiki/index.

php?title=2015_MICCAI_Challenge,*

12Second Annual Data Science Bowl,https://www.kaggle.com/c/second-annual-data-science-bow,*

13Computational Methods and Clinical Applications for Spine Imaging,http://csi2015.weebly.com,*

* as of 01 July 2017

(28)

2.1. VISCERAL PROJECT 13 Table 2.2: Overview of the manually annotated VISCERAL Anatomy series Gold Corpus.

The field–of–view is defined for each modality under the FOV column. Both the in–plane resolution range and in–between plane resolution are reported in milimiters. The number of volumes, annotated anatomical structures (Annotations) and located anatomical landmarks are reported per modality.

Modality FOV Contrast Resolution (mm) Annotations Landmarks CT whole–body un enhanced 0.977²−1.405²×3 384 530

trunk contrasted 0.604²−0.793²×3 387 440 MR T1w

& T2w

whole–body un enhanced 1.250²×5 305 520

abdomen contrasted 0.840²−1.302²×3−8 219 270

right kidney (light brown), liver (red), spleen (purple), urinary bladder (yellow), right psoas major muscle (pink), left psoas major muscle (light pink), left rectus abdominis muscle (light brown) and sternum (dark grey), trachea (turqoise), first lumbar vertebra (white), right rectus abdominis muscle (light orange), aorta(light red), thyroid (violet), pancreas (beige), gallbladder (yellow), right adrenal gland (light green), left adrenal glands (green). Both contrast enhanced and unenhanced CT and MR 3D scans of the whole body (wb), the trunk (CT contrast–enhanced, CTce) or abdomen (MR T1 contrast-enhanced, MRT1cefs) encouraged the development of segmentation algorithms both on high and low contrast. The data set comprises roughly the same number of male and female patients (62 male, 69 female) with an average age of 59.9 years(standard deviation of±9.79 years).

The four imaging modalities included in the Gold Corpus had all their scans taken from the same hospital during clinical practice using identical imaging protocols and the same imaging device for each modality. Whole body unenhanced CT (CTwb) scans were obtained from patients with confirmed bone marrow neoplasms to detect focal bone lesions (osteolysis). Contrast–enhanced CT scans (CTce) were acquired from patients with malig- nant lymphoma to detect pathological lymph nodes or organ affection. Whole body unenhanced MR (MRwb) scans from patients with multiple myeloma were included in the data set. All patients’ MR series originally included coronal T1–weighted and fat-suppressed T2–weighted or STIR (short tau inversion recovery) sequence of the whole body, plus a sagittal T1–weighted and a sagittal T2–weighted sequence of the entire vertebral column.

Finally, the MRI studies of the abdomen were abdomen contrast–enhanced fat–saturated MR T1 (Ab/MRT1cefs) from oncological patients who had metastases within the abdomen. The meta data from the distributed scans as well as face regions were anonymised so that any patient identification was removed but preserving the underlying structure of the face so that it can still be used for image analysis. Manual annotations by radiologists of the selected anatomical structure served as ‘ground truth’ for the training and testing of algorithms (see Table 2.2). When organs were not visible in the 3D scan, they were not annotated. All the data of the Anatomy benchmarks were available only for non–commercial research after participants signed a license agreement for research. Although the main evaluation setup remained the same during the three Anatomy benchmarks, there were some updates in the data set and submission process in each of the benchmarks. All the images and annotations were available as individual anonymised NIfTI (Neuroimaging In- formatics Technology Initiative) files in a cloud storage container without any additional pre–processing from their raw DICOM format. In all three benchmarks, a clear split of training and test images was used where only the training images were accessible to the participants. They trained their algorithms and implemented an executable in their vir-

(29)

VISCERAL Anatomy Gold Corpus

CTwb CTce MRwb MRce

Anatomical structure annotations – 2D coronal slice view

Anatomical structure annotations – 3D view

Landmarks – 3D view

Figure 2.1: Examples of patient volumes in the VISCERAL Anatomy Gold Corpus with their corresponding anatomical structures and landmarks. Bone structure and body con- tour are also shown in the background for spatial reference.

(30)

tual machine (VM) within the VISCERAL cloud framework. At the benchmark deadline, the VM was submitted and then the organisers privately ran the algorithms on the test data.

In Anatomy1, the participants had access to a VM with 4–core CPU and 8GB RAM and a training set of 28 annotated scans (7 per modality) with their corresponding annotated structures. The algorithm executables were tested in 51 manually annotated patient scans (27 CT, 24 MR). For Anatomy2, the size of the training set was increased to 20 volumes per modality and the computation power of the participants VMs was also doubled from 4 to 8 core CPU with 16GB of RAM. A continuous evaluation system was developed for the Anatomy3 benchmark where participants could submit their algorithms iteratively, at most once a week. Although the VISCERAL project is officially finished, the Anatomy3 continuous benchmark is ongoing with a public leaderboard on the VISCERAL website where participants can choose to make their results public¹⁴.

2.1.1.1 Annotating medical data

A primary step for developing robust medical image analysis algorithms is to generate or re–use a large data set containing both patient medical scans together with high–quality expert annotations (i.e. organ segmentations, landmark localisations and lesion annotations). During the VISCERAL project, several manual and semi-automatic segmentation tools were evaluated looking for fast and effective 3D annotation software interfaces. It is compelling to reduce the workload and annotation time of medical doctors performing manual annotations and segmentations. The manual annotation of the patients’ 3D volumes is a standard procedure in radiology to highlight structures and ROIs in the medical data. However, this is a demanding and time–consuming task with potential variations in the annotations depending on the experience of the annotator [224]. Several annotations tools have been developed essentially for the annotation of ROIs and segmentation of anatomical structures [224, 30, 22, 272, 176]. Some of these tools are added in full medical interpretation frameworks that provide visualisation and image analysis for an integral medical image computing experience [46].

Tools developed in house are difficult to maintain and selection the best suited framework can reduce the time to develop new applications [18, 15]. For the selection of an annotation tool for the VISCERAL project Gold Corpus generation, the visualization and medical application frameworks available were evaluated. The selected tool should be a semi–automatic annotation technique that could reduce the time required for making manual annotations of 3D structures and points of interest. Six free-of–charge frameworks were reviewed: GeoS¹⁵[30], ITK-SNAP¹⁶ [272], ImageJ¹⁷, MeVisLab¹⁸, MITK¹⁹ and 3DSlicer²⁰[206]. Frameworks with specific annotation functionality such as the techniques used in [176], or developed for a specific operating system (Osirix²¹) where early discarded from the group as the goal of the VISCERAL project was to annotate vari-

14Anatomy3 public Leaderboard, http://visceral.eu:8080/register/Leaderboard.xhtml, as of 01 July 2017

15http://research.microsoft.com/en-us/projects/geos, as of 01 Jul. 2017

16http://www.itksnap.org, as of 01 Jul. 2017

17http://rsb.info.nih.gov/ij, as of 01 Jul. 2017

18http://www.mevislab.de, as of 01 Jul. 2017

19http://www.mitk.org, as of 01 Jul. 2017

20http://www.slicer.org, as of 01 Jul. 2017

21http://www.osirix-viewer.com, as of 01 Jul. 2017

(31)

Table 2.3: Report on the valuation criteria for each of the frameworks or annotation tools.

+: satisfactory o: insufficient -: missing

Evaluation criteria GeoS ImageJ ITK–SNAP MeVisLab MITK 3DSlicer

Annotation on 3D volumes (CT and MRI) + + + + + +

Flexibility for different shapes + - + + + +

User–friendly usage + - o + o +

Optimal visualisation of segmentation o o + + + +

Suitable for data formats + - o + + +

Interactive manual corrections + + + + + +

Time reduction vs. manual annotation + o + - o o

Local image contrast settings + - o - + +

Output annotation format + o o - + o

Feasible tool upgrading + - - - o o

ous anatomical structures and landmarks with radiologists from different locations. In order to select the most suited annotation tool for the VISCERAL project based on their functionality in manual annotation tasks, ten criteria were defined:

• Posibility to annotate in 3D volumes

• Flexibility for different structure shapes and intensities

• User–friendly for medical annotators

• Optimal visualisation of output annotations

• Suitable for data formats

• Interactive enough for manual corrections

• Reduce time spent in manual annotations

• Able to manipulate local image contrast settings

• Output annotation format

• Tool upgrading after project has ended

An overview of the evaluation for each of the frameworks or annotation tools is summarised in Table 2.3. The final decision was taken with the collaboration from 2 radiologists testing the annotation tools and comparing their usability. The Microsoft GeoS annotation tool was selected for the organ segmentations, requiring only a few brush strokes from the user to generate 3D segmentations in patient volumes respecting their edges. Moreover, the tool has a simple annotation interface with only a few keypads needed to start using the tool for annotations. 3DSlicer was selected for the localisation of anatomical landmarks as this functionality was not yet implemented in the GeoS interface.

GeoS was developed as part of the Microsoft medical image analysis project InnerEye, which focuses on the automatic analysis of the patients’ scans. Its annotation tool GeoS uses a geodesic symmetric filter with contrast–sensitive spatial smoothness algorithm to efficiently segment 3D images. The segmentation method is based on a generalised geodesic distance transform (GGDT) and has a much faster implementation than graph–cut algorithms. Geodesic distance is implemented as the distance between two points in an image taking into account image content such as intensity gradient. A geodesic distance

(32)

map is initialised from a manual soft seed mask made up of brush strokes to indicate a foreground object and the background that surrounds it. The method is able to perform contrast–sensitive image editing or processing and change its behavior depending on the local image contrast. Some image processing tasks are shared thus reducing the required computational resources.

On the other hand, 3D Slicer is a module based software with two functional modules for annotating and segmenting medical 3D volumes. The user defines the volumes to be annotated, specifies the number of structures and can add additional sub–classes. Unlike GeoS, it includes a tool for the annotators to perform freehand segmentation and mark landmark coordinates in 3D volumes, which is why it was choosen as the annotation tool for the landmarks in the VISCERAL Anatomy benchmarks.

2.1.1.2 Metric selection for anatomical structure segmentation

When selecting and comparing segmentation algorithms, a clear evaluation method- ology should asses the quality of the output segmentations and quantify their similarity to the available ground truth. There are multiple segmentation metrics that have been tested in medical literature for the validation of segmentation algorithms [239]. Each metric measures a particular aspect of the segmentations, with a distance or similarity score, and portrays both sensitivities and biases interconnected to these aspects. For the VIS- CERAL Anatomy benchmark series, evaluation metrics were grouped into five categories according to their identity (i.e. their nature and definition) and the relationship between them, namely:

• Spatial overlap based

• Pair–counting based

• Information theoretic based

• Probabilistic based

• Spatial distance based

By selecting a subset of representative metrics from each group, the biases associated to each group of metrics are better dealt with. Participants of the Anatomy benchmarks were presented with the mean scores for all the 21 metrics shown in Tab. 2.4 during the evaluation phase of their algorithms. Nevertheless, the results analysis was focused on their performance according to the following four metrics: the Dice coefficient (DICE), the average distance (AVD), the interclass correlation (ICC), and the adjusted Rand index (ARI). This subset of metrics selected for the analysis of the benchmark algorithms includes a metric from most of the categories mentioned above and has a high correlation to the rest of the metrics from the same category. All the 21 metrics were computed with the help of an EvaluateSegmentation tool available as an open source project²². In this sec- tion, we define the four main metrics evaluated in the Anatomy segmentation benchmarks.

First, the Dice coefficient [40], also called the overlap index, is the most used metric in validating medical volume segmentations. In addition to the direct comparison between segmentations, it is common to use the Dice to measure reproducibility (repeatability)

22https://github.com/codalab/EvaluateSegmentation, as of 01 July 2017

(33)

Metric Symb References Spatial overlap based

Dice (=F1-Measure) DICE [276] [277] [142] [77] [178] [139] [2]

Jaccard index JAC [178] [77] [250] [2]

True positive rate (Sensitivity, Recall) TPR [2] [178] [139]

True negative rate (Specificity) TNR [2] [178]

False positive rate (=1-Specificity, Fallout) FPR →Specificity False negative rate (=1-Sensitivity) FNR →Sensitivity

False detection rate FDR [139]

Positive predictive value (Precision) PPV

Negative predictive value NPV

Accuracy ACC [193]

F-Measure (F1-Measure=Dice) FMS →Dice

Volumetric Similarity VS [77] [250]

Global Consistency Error GCE [250] [270]

Pair counting based

Rand Index RI [250] [270]

Adjusted Rand Index ARI [105]

Information theoretic based

Mutual Information MI [277] [142]

Variation of Information VOI [250] [270]

Probabilistic based

Interclass correlation ICC [227] [67]

Probabilistic Distance PBD [67] [77]

Cohens kappa KAP [178] [276]

Area under ROC curve AUC [277]

Spatial distance based

Hausdorff distance HD [178] [67] [77] [139]

Average distance AVD [178] [139] [50]

Mahalanobis Distance MHD [193]

Table 2.4: Overview of the metrics implemented for the VISCERAL Anatomy benchmarks.

The symbols in the second column are used to denote the metrics throughout the paper.

The column ‘Reference’ shows papers where the corresponding metric has been used in the evaluation of medical volume segmentation.

Quantitative analysis of medical images: finding relevant regions-of-interest for medical decision support

Thesis

Reference

Quantitative analysis of medical images: finding relevant regions-of-interest for medical decision support

Quantitative Analysis of Medical Images:

Finding Relevant Regions–of–Interest for Medical Decision Support

TH ` ESE

Oscar Jimenez-del-Toro

Contents

Abstract

R´ esum´ e

Resumen

Acknowledgments

Chapter 1

Introduction

1.1 Motivation

A B C

D

E F

1.2 Thesis objectives

1.3 Thesis overview

1.4 Scientific contributions

Chapter 2

Data sets and challenges for

evaluating medical image analysis algorithms

2.1 VISCERAL project