• Aucun résultat trouvé

Quantitative analysis of medical images: finding relevant regions-of-interest for medical decision support

N/A
N/A
Protected

Academic year: 2022

Partager "Quantitative analysis of medical images: finding relevant regions-of-interest for medical decision support"

Copied!
157
0
0

Texte intégral

(1)

Thesis

Reference

Quantitative analysis of medical images: finding relevant regions-of-interest for medical decision support

JIMENEZ DEL TORO, Oscar

Abstract

In the past decades the number of medical images inspected daily in health centers, as well as the complexity of imaging parameters have increased tremendously. An efficient quantitative analysis could improve health care by enabling a more objective interpretation of these imaging studies. The main goal of this thesis was to propose and evaluate novel methods that detect and quantify regions-of-interest (ROIs) in medical images. Challenges in medical image annotation and medical case-based retrieval were organized within a research group (VISCERAL) and are reviewed as a scientific contribution of this work. Moreover, multimodal (using both text and visual data) medical case-based retrieval systems are proposed both for radiology and digital pathology data, tackling the navigation of large-scale hospital repositories. By segmenting anatomical structures in full patient scans and measuring visual features in preselected regions, medical professionals can then prioritize their attention to the more significant structures in the images.

JIMENEZ DEL TORO, Oscar. Quantitative analysis of medical images: finding relevant regions-of-interest for medical decision support. Thèse de doctorat : Univ. Genève, 2017, no. Sc. 5106

URN : urn:nbn:ch:unige-962970

DOI : 10.13097/archive-ouverte/unige:96297

Available at:

http://archive-ouverte.unige.ch/unige:96297

Disclaimer: layout of this document may differ from the published version.

1 / 1

(2)

UNIVERSIT ´E DE GEN `EVE

D´epartement d’informatique FACULT ´E DES SCIENCES

Professeur Dr. St´ephane Marchand–Maillet D´epartement de radiologie et informatique m´edicale FACULT ´E DE M ´EDECINE Professeur Dr. Henning M¨uller

Quantitative Analysis of Medical Images:

Finding Relevant Regions–of–Interest for Medical Decision Support

TH ` ESE

pr´esent´ee `a la Facult´e des sciences de l’Universit´e de Gen`eve pour obtenir le grade de Docteur `es sciences, mention informatique

par

Oscar Jimenez-del-Toro

de

Guadalajara (M´exico)

Th`ese No 5106

GEN`EVE 2017

(3)

m´edicale), Monsieur S. MARCHAND-MAILLET, professeur associ´e et directeur de th`ese (D´epartement d’informatique) et Monsieur A. HANBURY, senior researcher (TU Wien, Wien, Austria) autorise l’impression de la pr´esente th`ese, sans exprimer d’opinion sur les propositions qui y sont ´enonc´ees.

Gen`eve, le 26 juillet 2017

Th`ese - 5106 -

Le D´ecanat

(4)

Contents

Abstract v

R´esum´e vii

Abstract ix

Acknowledgements xi

1 Introduction 1

1.1 Motivation . . . 1

1.2 Thesis objectives . . . 3

1.3 Thesis overview . . . 3

1.4 Scientific contributions . . . 5

1.4.1 Detecting and quantifying regions–of–interest in medical imaging for clinical decision support . . . 6

1.4.2 Segmenting anatomical regions–of–interest in medical imaging data . 6 1.4.3 Medical and histopathology case–based retrieval systems . . . 7

2 Data sets and challenges in medical image analysis 9 2.1 VISCERAL project . . . 10

2.1.1 Medical image detection and segmentation: VISCERAL Anatomy benchmarks . . . 11

2.1.2 Medical case–based retrieval: VISCERAL Retrieval benchmark . . . 20

2.2 Challenges in anatomical pathology . . . 23

2.2.1 Data sets provided in histopathology challenges . . . 25

2.2.2 Histopathology challenges tasks . . . 25

2.2.3 Evaluation metrics for histopathology challenges . . . 26

2.3 Summary . . . 27

3 Segmenting anatomical ROIs in medical image data 29 3.1 Exploiting spatial relationships with image registration . . . 30

3.2 Anatomical segmentation based on spatial relations . . . 31

3.2.1 Atlas–based segmentation . . . 31

3.2.2 Hierarchic anatomical structure segmentation . . . 32

3.2.3 Adding an intensity–based filter to improve atlas–based segmentations 34 3.2.4 Initial experiments and results . . . 35

3.2.5 Discussion . . . 37

3.3 VISCERAL Anatomy benchmarks . . . 38 i

(5)

3.3.1 Anatomical segmentation task . . . 38

3.3.2 Landmark detection task . . . 40

3.3.3 Discussion . . . 45

3.4 Segmentation metric correlation with medical visual assessment . . . 50

3.4.1 Data set . . . 50

3.4.2 Experiments and results . . . 51

3.4.3 Discussion . . . 51

3.5 Summary . . . 52

4 Quantifying ROIs in medical imaging 53 4.1 Biomedical texture analysis for medical images . . . 54

4.2 3D texture quantification of manual ROIs to detect abnormalities . . . 55

4.2.1 Background . . . 55

4.2.2 Methods . . . 56

4.2.3 Experiments and results . . . 57

4.2.4 Discussion . . . 59

4.3 Aggregating image features with a bag of visual words approach . . . 62

4.4 Detecting anatomical structures by classifying texture features . . . 62

4.4.1 Background . . . 63

4.4.2 Methods . . . 64

4.4.3 Experiments and results . . . 66

4.4.4 Discussion . . . 67

4.5 Deep learning . . . 70

4.6 Digital pathology image classification . . . 71

4.7 Generating ROIs for grading histopathology images . . . 73

4.7.1 Background . . . 73

4.7.2 Methods . . . 74

4.7.3 Experiments and results . . . 76

4.7.4 Discussion . . . 77

4.8 Summary . . . 78

5 Medical case–based retrieval systems 79 5.1 Multimodal medical case–based retrieval . . . 80

5.1.1 Background . . . 80

5.1.2 Methods . . . 80

5.2 Medical case–based retrieval: VISCERAL Retrieval benchmark . . . 84

5.2.1 Experiments and results . . . 85

5.2.2 Discussion . . . 90

5.3 Histopathology case–based retrieval . . . 91

5.3.1 Background . . . 93

5.3.2 Methods . . . 93

5.3.3 Experiments and results . . . 95

5.3.4 Discussion . . . 97

5.4 Summary . . . 97

6 Conclusions and perspectives 99

Notation 103

(6)

CONTENTS iii

Glossary 105

List of Figures 107

List of Tables 112

Bibliography 115

Index 139

(7)
(8)

Abstract

Medical imaging techniques are able to show internal structures of the human body that would otherwise remain hidden without invasive procedures i.e. surgery. Medical professionals rely on these images to detect and characterise illnesses that could be present in patient studies. Doctors learn to establish correlations between visual image patterns and the structural organisation of human anatomy in the search for subtle variations that could indicate the presence of disease. However, in the past decades the number of medical images inspected daily in health centers, as well as the complexity of imaging parameters have increased tremendously. The time to interpret these imaging studies as part of their daily workload is limited, leading occasionally to a failed detection in difficult cases and errors in the interpretation. An efficient quantitative analysis could improve health care by enabling a more objective interpretation of these studies.

Clinical decision support and computer–aided diagnosis systems have been proposed to generate objective and repeatable measurements that could increase the reliability of medical images assessments. Some of these semi– and fully automatic methods have al- ready been included in the clinical workflow, reducing errors and inter–operator variability.

A fundamental step in the development of these systems is to initially locate anatomical structures and relevant areas where a more thorough image analysis should be performed.

The main goal of this thesis is to propose and evaluate novel methods that detect and quantify regions–of–interest (ROIs) in medical images. By segmenting anatomical struc- tures in full patient scans and measuring visual features in preselected regions, medical professionals can then prioritise their attention to the more significant structures in the images.

A large data set that comprehends the natural anatomical variability and complexity of medical data is then required to train and test medical algorithms designed for real clinical scenarios. Biomedical image analysis challenges provide large data sets to participants and promote an objective comparison of medical algorithms with shared goals. As an outcome of these challenges, the advantages and limitations of diverse strategies are identified, thus increasing the understanding of the visual descriptors associated to anatomical structures and diseases. Challenges in medical image annotation and medical case–based retrieval were organised within a research group (VISCERAL) and are reviewed as a scientific con- tribution of this work. A cloud–based evaluation framework where various segmentation algorithms can be benchmarked is presented, as well as the comparison of their results to the visual interpretation of radiologists.

In this thesis, the different steps implemented in the detection and annotation of ROIs are discussed along with their corresponding set of measurements from handcrafted and deep learning features. Particularly, a hierarchic anatomical structure segmentation method was developed aiming to exploit the individual anatomical variations present in limited data sets of manually annotated structures. The method obtained promising

v

(9)

results in the VISCERAL Anatomy benchmarks for the segmentation of large and high–

contrasted anatomical structures and showed a notable advantage in the segmentation of smaller and harder to detect structures e.g. gallbladder and adrenal glands. Moreover, multimodal (using both text and visual data) medical case–based retrieval systems are proposed both for radiology and digital pathology data. These tools were designed to navigate large scale hospital repositories including medical reports and 3D patient scans, retrieving relevant information from automatically selected regions–of–interest. The cor- rect selection of visual features from the computed ROIs could benefit clinical decision support systems with more consistent evaluations of the medical images and a better comprehension of the underlying pathological processes.

(10)

R´ esum´ e

Les techniques d’imagerie m´edicale sont capables de montrer les structures internes du corps humain qui resteraient autrement cach´ees sans proc´edures invasives telles que la chirurgie. Les professionnels de la sant´e comptent sur elles pour d´etecter et caract´eriser les maladies qui peuvent ˆetre pr´esentes dans des ´etudes des patients. Ils inspectent visuelle- ment les images `a la recherche de variations subtiles de taille, de forme ou d’apparence qui pourraient indiquer la pr´esence d’alt´erations dans les structures anatomiques. Au cours des derni`eres d´ecennies, le nombre d’images m´edicales inspect´ees quotidiennement dans des centres de sant´e, ainsi que la complexit´e des param`etres d’images ont consid´erablement augment´e. ´Etant donn´e leur charge de travail quotidienne, le temps pour interpr´eter ces

´etudes d’imagerie est limit´e, ce qui m`ene parfois `a une d´etection erron´ee dans les cas difficiles et `a des erreurs d’interpr´etation. En outre, l’´evaluation des images m´edicales est souvent bas´ee sur des crit`eres subjectifs qui peuvent d´ependre de la formation et de l’exp´erience professionnelle de l’interpr`ete. Une analyse quantitative efficace pourrait am´eliorer les soins de sant´e en permettant une interpr´etation plus objective de ces ´etudes.

Les syst`emes d’aide `a la d´ecision clinique et de diagnostic assist´e par ordinateur g´en`erent des mesures objectives et reproductibles qui peuvent ˆetre utilis´ees pour accroˆıtre la fiabilit´e des ´evaluations r´ecurrentes des images m´edicales. Certains de ces algorithmes semiautomatiques et enti`erement automatiques ont d´ej`a ´et´e inclus dans le workflow clin- ique, r´eduisant les erreurs et la variabilit´e inter–op´erateur. Une ´etape fondamentale dans le d´eveloppement de ces syst`emes consiste `a localiser d’abord les zones pertinentes et les structures anatomiques o`u une analyse d’image plus approfondie devrait ˆetre effectu´ee.

L’objectif principal de cette th`ese est de proposer et d’´evaluer de nouvelles m´ethodes qui d´etectent et quantifient des r´egions d’int´erˆet (ROI pour son acronyme en anglais), afin de mettre en ´evidence les alt´erations ou simplement servir de base pour l’analyse s´emantique d’images. La mesure des caract´eristiques visuelles dans les ROI peut attirer l’attention des professionnels de la sant´e sur les r´egions qui pourraient influencer l’interpr´etation de l’´etude compl`ete. Les professionnels de la sant´e peuvent alors prioriser l’´evaluation des donn´ees filtr´ees pour effectuer une inspection plus approfondie des r´egions et des structures d´etect´ees.

Les professionnels de la m´edecine apprennent `a ´etablir une corr´elation entre les motifs d’images visuelles et l’organisation structurelle de l’anatomie humaine au cours de leur formation m´edicale. Les images sont interpr´et´ees par des radiologues dans un processus qualitatif bas´e sur des descripteurs visuels tels que la couleur, la forme ou l’apparence. Les caract´eristiques d’images bas´ees sur des donn´ees, comme celles tir´ees de la texture et celles obtenues `a partir de r´eseaux de neurones convolutifs peuvent ˆetre utilis´ees pour d´efinir et quantifier des ROI annot´ees manuellement par des experts en sant´e ou s´electionn´ees au- tomatiquement par des algorithmes d’imagerie m´edicale. Les ´etapes impliqu´ees dans la d´etection et l’annotation des ROI sont discut´ees, ainsi que les caract´eristiques manuelles

vii

(11)

et de deep learning correspondantes. Ces caract´eristiques soulignent les anomalies as- soci´ees `a certains processus pathologiques. Dans ce contexte, le recalage d’images est mis en ´evidence comme ´etant une technique de traitement d’image d´evelopp´ee pour com- biner des donn´ees provenant de plusieurs patients, ce qui se traduit par des informations suppl´ementaires non apparentes dans les images s´epar´ees. En segmentant plusieurs struc- tures anatomiques dans des scans complets de patients, une s´election initiale des structures principales peut cibler une analyse ult´erieure et plus ´etendue des ROI.

Un vaste ensemble de donn´ees tenant compte de la variabilit´e naturelle et la complexit´e des donn´ees m´edicales est n´ecessaire pour entraˆıner et tester des algorithmes m´edicaux simulant un sc´enario clinique r´eel. Des comp´etitions li´ees `a l’analyse d’images biom´edicales fournissent ´a leurs participants des ensembles de donn´ees avec des annotations manuelles de professionnels de la sant´e et favorisent la comparaison d’algorithmes m´edicaux sur des tˆaches communes. Les avantages et les limites de diverses approches sont identifi´ees afin d’accrotre la compr´ehension des descripteurs visuels associ´es `a la repr´esentation de structures anatomiques et de maladies. Plusieurs comp´etitions en mati`ere d’annotation d’images m´edicales, de recherche de cas m´edicaux et d’histopathologie ont ´et´e organis´ees et sont examin´es dans le cadre de ce travail. Un framework d’´evaluation en ligne qui ´evalue la qualit´e de diff´erents algorithmes de segmentation est d´ecrit, ainsi que la relation entre les m´etriques de segmentation et l’interpr´etation visuelle des radiologues.

Les segmentations produites par ces algorithmes peuvent ˆetre fusionn´ees pour cr´eer des ensembles de donn´ees plus volumineux comprenant des estimations spatiales de structures anatomiques. Cela pourrait s’av´erer une ressource pr´ecieuse pour les chercheurs et les hˆopitaux puisque l’annotation manuelle d’images m´edicales n’est pas ´evolutive et peut ˆetre sujette `a erreur. La s´election de caract´eristiques visuelles appropri´ees des ROI calcul´ees pourrait faciliter la prise de d´ecision clinique avec des ´evaluations plus coh´erentes des images m´edicales et une meilleure compr´ehension des processus sous–jacents. En outre, les cas m´edicaux les plus pertinents peuvent ˆetre identifi´es, permettant aux professionnels m´edicaux de consacrer plus de temps aux cas les plus difficiles `a diagnostiquer ou `a ceux avec une urgence m´edicale plus ´elev´ee.

(12)

Resumen

A trav´es de t´ecnicas de imagen m´edica es posible visualizar estructuras del cuerpo hu- mano que permancer´ıan ocultas sin procedimientos invasivos, como una cirug´ıa. Los pro- fesionales m´edicos dependen de estas im´agenes para detectar y caracterizar enfermedades que pudieran estar presentes en los estudios de los pacientes. Los doctores aprenden a establecer una correlaci´on entre los patrones visuales en las im´agenes y la organizaci´on estructural de la anatom´ıa humana en la b´usqueda de variaciones s´utiles que pudieran indicar la presencia de enfermedad. Sin embargo, en las ´ultimas d´ecadas, el n´umero de im´agenes m´edicas inspeccionadas diariamente en los centros de salud, as´ı como la com- plejidad de los par´ametros en cada imagen se han incrementado enormemente. El tiempo para interpretar estos estudios de imagen como parte de la carga diaria de trabajo es limitado, resultando ocasionalmente en una detecci´on fallida de casos dif´ıciles y errores en la interpretaci´on. Un eficiente an´alisis cuantitativo podr´ıa mejorar el cuidado de la salud permitiendo una interpretaci´on m´as objetiva de estos estudios.

Los sistemas de apoyo en las decisiones cl´ınicas y de diagn´ostico asistido por computa- dora generan mediciones objetivas y repetibles que pueden ser usadas para incrementar la confiabilidad de las valoraciones hechas en im´agenes m´edicas. Algunos de estos algo- ritmos, semi– o totalmente autom´aticos, ya han sido incluidos en los procesos de trabajo cl´ınicos, reduciendo errores y la variabilidad inter–operador. Un paso fundamental en el desarrollo de estos sistemas es localizar inicialmente estructuras anat´omicas y ´areas rele- vantes donde sea necesario un an´alisis m´as exhaustivo de la imagen. El principal objetivo de esta tesis es proponer y evaluar nuevos m´etodos que detecten y cuantifiquen regiones–

de–inter´es (ROIs) en im´agenes m´edicas. Mediante la segmentaci´on de varias estructuras anat´omicas en escaneos completos de pacientes y midiendo caracter´ısticas visuales en re- giones preseleccionadas, los profesionales m´edicos pueden entonces priorizar su atenci´on a las estructuras m´as significativas de las im´agenes.

Un extenso conjunto de datos que incorpore la variabilidad natural y complejidad de los datos m´edicos es entonces necesario para entrenar y probar los algoritmos m´edicos dise˜nados para escenarios cl´ınico reales. Las competencias de an´alisis de im´agenes biom´edi- cas proporcionan conjuntos de datos a los participantes y promueven una comparaci´on objetiva de algoritmos m´edicos con metas afines. Como consequencia, las ventajas y limitaciones de diversas estrategias son identificadas, incrementando el entendimiento de los descriptores visuales asociados a la representaci´on de estructuras anat´omicas y en- fermedades. Competencias en la anotaci´on de im´agenes m´edicas y recuperaci´on de casos m´edicos fueron organizadas junto a un grupo de investigaci´on (VISCERAL) y son revisadas como una contribuci´on cient´ıfica de este trabajo. Una estructura de evaluaci´on basada en la nube, donde varios algoritmos de segmentaci´on pueden ser analizados es presentada, as´ı como la comparaci´on de sus resultados a la interpretaci´on visual de radi´ologos.

En esta tesis, los diferentes pasos implementados en la detecci´on y anotaci´on de ROIs ix

(13)

son discutidos junto con los correspondientes conjuntos de caracter´ısticas dise˜nadas ma- nualmente y de aprendizaje profundo. Particularmente, un m´etodo de segmentatici´on jer´arquica de estructuras anat´omicas fue desarrollado con la intenci´on de explotar las variaciones anat´omicas individuales presentes en conjuntos reducidos de estructuras ma- nualmente anotadas. El m´etodo obtuvo prometedores resultados en las competencias

‘VISCERAL Anatomy’ para la segmentaci´on de estructuras anat´omicas grandes y con alto–contraste y mostr´o una notable ventaja en la segmentaci´on de estructuras peque˜nas y dif´ıciles de detectar, por ejemplo, la ves´ıcula biliar y gl´andulas adrenales. Adem´as, sistemas multimodales (usando tanto texto como informaci´on visual) de recuperaci´on de casos m´edicos son propuestos tanto para datos radi´ologicos y de patolog´ıa digital. Es- tas herramientas fueron dise˜nadas para navegar extensos repositorios m´edicos, incluyendo reportes m´edicos y escaneos de pacientes en 3D, recuperando informaci´on relevante de regiones–de–inter´es seleccionadas autom´aticamente. La selecci´on correcta de caracter´ısticas visuales de las ROIs computadas podr´ıa benficiar sistemas de soporte cl´ınico con evalua- ciones m´as consistentes de las im´agenes m´edicas y una mejor comprensi´on de los procesos pat´ologicos esenciales.

(14)

Acknowledgments

I would like to thank everyone who participated and contributed to the development of this thesis and this experience in my life as a PhD candidate. Foremost, I express sincere gratitude to Professors St´ephane Marchand–Maillet and Henning M¨uller for their guidance during this period, generous sharing of their knowledge and constant support throughout my PhD. Particularly, I would like to thank the opportunity Prof. Henning M¨uller has given me to be part of his research group, in which I have come to love and enjoy working everyday during the past 5 years. I would like to spend my professional career being an example of the hard work that is promoted in this group always looking for new challenges and original breakthroughs. A special mention to PhD Allan Hanbury, with whom I share many scientific publications, I thank him for accepting to be my thesis expert advisor and for the work shared during this period.

I wholeheartedly thank my colleagues and teammates from the MedGIFT group and from the VISCERAL and DESUTO projects, with whom I shared deadlines, success stories and funny moments as part of the perks from the daily workload. Namely the following people in Sierre: Adrien, Alba, Andres, Antonio, Dimitris, Geovanny, Ivan, Manfredo, Matteo, Pol, Ranveer, Roger, Sebastian, Stefano, Visara and Yashin. Thank you all for your cookies, warm conversations and smart quips during the coffee breaks. It has also been a wonderful experience to get to know hardworking individuals like Abdel, Bjoern, Georg, Katharina, Marianne, Markus, Mattias and Orcun in the VISCERAL project and Kristian, Lena and Martin, Mats and Peter from the DESUTO project.

Andre, Ale Mam´a, Javier, Mafer and Pau for being a new family for me and helping us in this voyage together. Thanks to Lalo, Mam´a and Pap´a for accompanying us since the beginning and being the most caring and loving persons I know. Thank you for devoting such positive characters into building a better version of myself every day. You have taken me farther than I would have ever imagined.

Finally, I would like to thank Ale. It has been your daily smile and our enchanting communication that have put us through the roughest patches, but mostly through the happiest moments of this adventure together. Thank you for sharing your life with me, revealing the brave and full of hope person you are.

xi

(15)
(16)

Chapter 1

Introduction

Medical imaging is an essential diagnostic technique used by health professionals for evaluating the medical condition from a patient. Multiple medical imaging modalities exist, with each one portraying a distinct signal from the human body. These signals can be physically measured and represented as a vector or a multi–dimensional matrix. The analysis of these signals can be associated to anatomical and pathological information from underlying structures [207]. In some cases, the generated images and volumes contain body part attributes that would only be visible if invasive procedures were performed instead, such as surgical intervention. In Fig. 1.1 sample images from some of the most commonly requested medical imaging techniques are shown. Moreover, some of these modalities (e.g. magnetic resonance imaging, MRI) can depict hidden functional processes that would otherwise remain unnoticed during visual inspection e.g. organ perfusion and cell metabolism. Medical professionals, including radiologists and pathologists, do not only confirm or rule out alterations from this data, but are also able to characterise the pathological processes causing a specific health condition [42].

1.1 Motivation

In the past decades, the amount of medical data available to assist in the decision–

making process, including medical imaging, has increased substantially, both in storage size and information load within the data [43]. Medical professionals daily inspect a large quantity of patient scans showing multiple anatomical structures and even full body systems. The health care costs related to medical imaging are rising as well [252]. For example, taking only mammography images into consideration, 1 billion imaging exams (4 images per exam) are performed worldwide per year, which results in an average of 120 medical image perception events per second [152].

The task of visually inspecting these studies can be time consuming and significantly labor intensive, particularly when large regions of multidimensional images are screened with multiple imaging parameters to optimise [5, 163]. Additionally, subjective measure- ments and personal experience may result in intra– and inter–operator variability when interpreting these studies [175]. In difficult or inconclusive cases these events are more frequent, consequently reducing the reliability of the diagnostic tests [224]. A miss rate as high as 30% and a false positive rate up to 15% has been estimated overall in radiol- ogy [151]. False positives might result from overlaying structures or insufficient experience from the interpreter. On the other hand, false negatives have been classified into three

1

(17)

A B C

D

E F

Figure 1.1: Different medical imaging modalities implemented for health care: (from left to right) A) X–ray, B) computed tomography, C) positron emission tomography, D) ultrasound, E) magnetic resonance imaging and F) digital pathology whole–slide image.

categories according to the amount of time invested in their inspection:

• Search errors: if the radiologic features from a pathological condition are never looked at,

• Recognition errors: if the radiologic features are looked at but not enough to be interpreted as a pathological condition,

• Decision errors: the features are looked at for long periods of time but they are not considered as a pathological condition.

Computer–aided diagnosis has been defined as “the use of computer algorithms to aid the image interpretation process” [70]. Besides helping in the detection process, it is also used in the computerized extraction of quantitative measurements from medical images [252]. CAD systems, as well as clinical decision support (CDS) systems, provide clinicians with filtered information that can enhance patient health care [199]. Further- more, the algorithms compromised in these systems are designed to detect, measure and diagnose imaging abnormalities in clinical studies [48]. One of their goals is to help med- ical doctors to perform objective and repeteable measurements [271, 143, 73]. By sparing medical professionals of the quantifiable procedures in their work, CAD and CDS systems can overcome arbitrary clinical reasoning [180]. They are particularly advantageous when large amounts of medical data should be reviewed, maximising both productivity and re- liability. These systems have improved the diagnostic accuracy of radiologists by reducing human errors and simplifying some of the most time–consuming and tedious endeavours in medical image interpretation [57]. Through a faster and more consistent the image interpretation process, individual radiologists can read more cases efficiently, taking into consideration situation–specific data and system recommendations [44, 192]. Thus with the development of algorithms for CAD and CDS, researchers aim to mimic the role of ra- diologists and pathologists by quantifying image data and connecting clinical information to support the final assesment.

(18)

1.2. THESIS OBJECTIVES 3

1.2 Thesis objectives

A top priority for CAD and CDS algorithms is to reduce the amount of data that medical experts have to inspect, either by selecting the most pressing cases or by enhancing relevant areas in the images. In Fig. 1.2) three different cases are presented where the areas containing the most relevant information from the corresponding study have been highlighted. In the far left, image a cropped rendering from a 3D whole–body computed tomography (CT) scan is shown with the segmentation of the left adrenal gland (in red).

The image in the middle shows a lesion in the brain that has been manually delineated (in red) in the segmented brain from a MR image of the head. To the far left is a zoomed out view from a whole–slide image where three areas that were visually inspected by a pathologists are highlighted in this resolution (in red). These samples are shown to emphasize that in the majority of patient cases, the areas containing the most relevant information can be quite small in comparison to the full images. Moreover, the alterations might only be visible in one of multiple images composing the patient study.

The diagnostic process involves two basic tasks: visual perception and rendering an interpretation (cognition), as they are not self-explanatory [152]. When medical experts inspect medical images, they look for specific visual patterns that might reflect underlying changes in size, shape or appearance of the anatomical structures to confirm or discard a particular diagnosis. The images can vary considerably even if they were obtained from the same imaging modality and correspond to a similar pathology. Contrast, brightness, scale and image quality are some of the factors known to influence the interpretation and subsequent clinical decisions of visual interpreters [145]. A deficient search of the images can lead to missing lesions and false negative events [247]. Thus, an initial key step in this diagnostic process is deconstructing medical images into different structures and components, which can be referred to as image parsing [183]. The selected content–

based image features used for performing this task can be categorized at pixel, object and semantic level, based on their biological interpretability [96]. Sorting out raw medical image data into semantic elements can be useful for clinicians to efficiently read and navigate through the images with parsing tools [222]. Unlike carrying out a search using only–text information, such as keywords, medical images can also be searched and retrieved for their content with automatic parsing [223].

Therefore, an initial crucial step in medical image analysis systems is to detect and/or segment the anatomical structures present in the image, thus contributing to the visual perception task. Specific regions–of–interest can then prioritise the attention of the in- terpreters to the foremost structures in the images. How to objectively detect, segment and quantify manually annotated and automatically computed regions–of–interest (ROIs) in medical imaging is the main topic addressed in this thesis. Different approaches to generate and compare ROIs are evaluated and discussed.

1.3 Thesis overview

The work addressed in this thesis was mostly developed within two european research projects: the Visual Concept Extraction Challenge in Radiology (VISCERAL)1 and the Self–learning decision support tool for fast and accurate cancer evaluation in digital pathol-

1http://www.visceral.eu/, as of 01 Jul. 2017

(19)

Figure 1.2: Samples of medical images with manually annotated regions-of-interest (ROIs).

Medical doctors annotate these areas because they contain relevant information for the clinical status of the patient. They visually inspect full images in 2D slices, sometimes from 3D volumes, to detect these ROIs.

ogy (SLDESUTO–BOX)2. The VISCERAL project was supported by the European Com- mission under the Information and Communication Technologies (ICT) theme of the 7th Framework Programme for Research and Technological Development. The project de- fined and executed a targeted benchmark framework to speed up progress towards: 1) the automated anatomy and pathology identification in radiology (MRI and CT) volumes, and 2) similar retrieval of multimodal radiology data. The objectives, benchmarks and conclusions from the VISCERAL project will be explained more in detail in the following chapters of this thesis (e.g. Ch. 2.1). On the other hand, SLDESUTO–BOX received funding from the Eurostars–2 Joint Program with co–funding from the European Union’s Horizon 2020 research and innovation program. The SLDESUTO–BOX is a joint project aimed at developing a decision support toolbox that supports pathologists in their task of diagnosing and evaluating the prognosis of different types of cancer. Two main outcomes were delivered from this project: 1) an innovative decision support toolbox for digital pathology, and 2) knowledge development in the field of image analysis and technology integration for digital pathology.

Although there is a strong overlap between the objectives and outcomes of the men- tioned projects and the body of publications from the author of this thesis, the main scientific contributions and analysis performed specifically by the author are highlighted.

In this sense, both projects are addressed within the context of the topics and objectives discussed in this thesis. The motivation and scientific contributions of this thesis are introduced in this first chapter together with the corresponding references of the work published by the author in this regard.

In particular, the clinical contribution of CAD and CDS systems is clear when they are objectively evaluated in large data sets of real medical data. A current trend in this field is the organisation of biomedical image analysis challenges where state–of–the–art methods can be tested and compared using manually annotated medical data. An overview of the evaluation framework and contributions from this thesis to these challenges is described in chapter 2 both in the radiology and digital pathology fields.

A specific track of ROI detection corresponds to the segmentation of anatomical struc-

2http://www.contextvision.com/sldesuto-box-a-eurostars-sponsored-project/, as of 01 Jul.

2017

(20)

1.4. SCIENTIFIC CONTRIBUTIONS 5

tures. Organ segmentation is an active topic in medical imaging with various approaches proposed in literature. A hierarchic anatomical segmentation strategy based on anatom- ical correlations is described in Chapter 3. Moreover, the comparison of results from this strategy in the VISCERAL Anatomy segmentation challenge are also shown. Finally in this chapter, and as part of the evaluation framework of the Anatomy benchmarks, the author contributed with a visual evaluation system, defined by radiologists, to which segmentation metrics were measured against.

Chapter 4 presents the proposed methods for detecting and quantifying the data from ROIs in medical images. The techniques and applications that were developed as part of this work are described. The common goal of these applications was to search for basic semantic elements in the imaging studies of the patients for medical decision support.

The experiments evaluating the contributions with their corresponding clinical–context are also mentioned in this chapter.

In chapter 5, the main topic discussed is the retrieval of medical images and cases using regions–of–interest. Case–based retrieval systems implemented both for radiology and digital pathology data were developed by the author of this thesis. In the case of the radiology multimodal retrieval system, both the data set and evaluation were performed within the VISCERAL Retrieval benchmark. The first multimodal case–based retrieval strategy for histopathology data is proposed in this chapter with whole–slide images and full pathology reports.

Finally in chapter 6, we conclude the thesis with a revision of the objectives and a set of perspectives resulting from the work performed.

1.4 Scientific contributions

Scientific contributions in the fields of ROI detection, segmentation and application for medical decision support are explained more in depth in later chapters. A diagram of the topics investigated according to the selected medical tasks is shown in Fig. 1.3.

The diagram shows four different tasks health professionals carry out when inspecting and interpreting a medical image that could be targeted with computer vision algorithms:

• Evaluate: The performance of state–of–the–art algorithms for medical image analysis tasks should be compared in a common data set to understand the advantages and limitations of each strategy. Moreover, these algorithms should also be set side by side to the interpretation provided by medical professionals to define the success of the proposed solutions (see Ch. 2).

• Detect: When interpreting medical images the first step is to select the areas in the image and regions–of–interest that might have more relevance in the medical assessment (see Ch. 3).

• Quantify: The correct features from the detected regions–of–interest can then be ex- tracted and quantified to have an objective measurement of the visual characteristics shown in this areas (see Ch. 4).

• Decide: A repeatable decision based on the appropriate measurements can help reduce the inter and intra–observer variability when interpreting medical images (see Ch. 5).

The main contributions from this thesis are highlighted in the following paragraphs:

(21)

Figure 1.3: Diagram of the topics discussed in this thesis.

1.4.1 Detecting and quantifying regions–of–interest in medical imaging for clinical decision support

• A semi–automatic method for the quantification of manually annotated ROIs that can objectively detect a texture abnormality in the brain [120].

• A texture analysis strategy for multiple organ structures, aiming to identify their appearance in a context–free approach [54, 119].

• An overview of various texture analysis methods proposed in literature, particularly tested in histopathology images [129].

• A fully automatic and weakly supervised cancer grading algorithm using deep learn- ing for prostate whole–slide images without relying on manually annotated ROIs [115].

1.4.2 Segmenting anatomical regions–of–interest in medical imaging data

• Contributed as an organiser of the VISCERAL cloud–based evaluation framework and participated as co–author of various scientific publications regarding the data set, set up and outcome of this project [82, 266, 147, 80, 81, 148].

• A hierarchic multi–anatomical structure segmentation approach guided by spatial correlations [125, 131, 124, 123, 126, 118, 41, 128].

(22)

1.4. SCIENTIFIC CONTRIBUTIONS 7

Figure 1.4: Automatic segmentation of anatomical structures in medical images opens the door for semantic parsing.

• Main author from an overview of the VISCERAL Anatomy benchmarks. In this work an analysis was performed, by author of this thesis, participant algorithms and results of the Anatomy (1–3) organ segmentation and landmark detection tasks [121, 74, 128].

• Participated as co–author in scientific publications interested in the selection of the most suitable segmentation metrics for medical imaging. Contributed in the development of a visual evaluation system, from a clinical perspective, of output segmentations from anatomical structures [240, 241, 128].

1.4.3 Medical and histopathology case–based retrieval systems

• A strategy combining radiology imaging and clinical meta–data for multimodal med- ical case–based retrieval [116, 117].

• Main author and co–author of publications reviewing the results from the VIS- CERAL Retrieval benchmark. The author of this thesis summarised and discussed the data sets, set up, participant algorithm and results of the Retrieval bench- mark hosted in the Multimodal Retrieval in the Medical Domain (MRMD) work- shop [186, 122, 187, 188, 127].

• A deep multimodal case–based retrieval for large histopathology datasets [130].

Other papers written during the projects involved in the creation of this thesis are [189, 220, 17, 249, 233]. These papers are related to the interpretation of health data obtained from wearable and mobile applications for pervasive health care platforms. Our contri- bution to these works was on the design of the applications and the interpretation of the obtained medical data.

(23)
(24)

Chapter 2

Data sets and challenges for

evaluating medical image analysis algorithms

State–of–the–art medical image analysis algorithms can objectively identify and mea- sure pathological signs for robust and repeatable image quantification [271]. This could help medical doctors in interpreting a larger number of patient imaging studies and support their diagnostic hypothesis [143]. However, selecting the right algorithm or combination of algorithms that produce an optimal solution for a desired medical task is not always clear.

The available state–of–the–art approaches should be tested extensively and in different settings to define the advantages and limitations of each method. A frequent bottleneck in the evaluation of these approaches, is the lack of a common data set where various algorithms can be tested and compared [90, 158]. Solutions published in the literature with non–public data sets limit comparisons with other techniques and can end in poor baselines or ‘improvements that don’t add up’ [8]. Overall, this creates an ‘illusion of progress’ with a limited influence of the approaches to justify their implementation in real hospital scenarios [91].

A large data set that truly represents the natural variability and complexity of medical data is in general preferred for training and testing of algorithms. Building these large data sets for medical image analysis requires a ground truth derived from manual annota- tions performed by health professionals (e.g. radiologists) in large quantities [88]. These annotations are analysed during algorithm training with the aim to replicate learned ex- pert judgments for unseen target images. Particularly for supervised or semi–supervised deep learning methods this step is crucial for the development of successful models. The number of attainable manual annotations is limited and does not scale to the magnitude of training samples needed to characterise distinct diseases [158]. Moreover, manual an- notation, such as the localisation and segmentation of various structures in an image, can be an expensive and time–consuming labour for the radiologists, difficult to reproduce.

In recent years, there has been a widespread of online challenges with shared data sets in different fields of research including medical imaging3. These challenges have attracted both academic groups as well as companies4. Multiple participants have submitted algo- rithms for demanding tasks and had access to data sets from different imaging modalities

3MICCAI Grand Challenges,http://grand-challenge.org/All_Challenges/, as of 01 July 2017

4Kaggle,https://www.kaggle.com/, as of 01 July 2017

9

(25)

and diseases [181, 93, 184, 89]. Unlike the first evaluation campaigns, the data sets from the latest challenges are becoming closer to “realistic” content used in health care, both in size and variation found in real patients [87]. The risk of overfitting is reduced when the data is obtained from different scanners, different health centers and multiple patients to better represent a real clinical scenario.

As a basic component of this thesis, contributions were made in the organisation, participation and result analysis of challenges with the following topics:

• Detection and segmentation of anatomical structures

• Case–based medical image retrieval

• Anatomical pathology challenges

In the following sections the data sets and tasks from the challenges on which the contri- butions from this thesis were trained and evaluated are described.

2.1 VISCERAL project

Aiming to “bring the algorithms to the data” the VISCERAL project 5 was estab- lished as a cloud–based evaluation framework for medical image analysis algorithms [158].

The goal of the VISCERAL benchmarks is to create a single, large, and multi–purpose medical image data set and evaluation infrastructure. Through organized benchmarks, research groups can test their specific applications and compare them to other available solutions against the standard manual annotations. It has organized benchmarks in medi- cal image annotation, lesion detection and medical case–based retrieval with a large Gold Corpus generated with manual annotations from radiologists. One of the main innovations from the VISCERAL framework, compared to previous evaluation campaigns, is that it was hosted in a cloud environment, namely the Microsoft Azure cloud 6. Both storage and computation power are virtually unlimited in a cloud platform hence the capability to create and manage virtual machines (VMs) for the participants as well as data stor- age containers. Furthermore, data from the challenges can be accessed and distributed through a cloud–based framework, in contrast to previous medical image challenges where both training and testing set were sent by post in hard drives or downloaded via online platforms [178]. A direct benefit from the VISCERAL approach is that large data sets (e.g. terabytes of data) can be publicly available to participants. Data privacy require- ments associated with sensitive medical data are no longer a limitation for their usage since the data are stored centrally and duplication of confidential material can be avoided.

Particularly in this project, the Azure cloud is HIPAA (Health Information Portability and Accountability Act) certified [6].

With central storage, data sets can also be made private with different access permis- sions, meaning participants can have access to only subsets of data. Challenge admin- istrators can run directly the participant algorithms, implemented in the provided VMs, on private test sets, not available to the participants themselves. This is advantageous to evaluation campaigns as intended or unintended bias is reduced if participants do not have access to the test set before their submission [170]. In recent years, some challenges have distributed their test set during live competitions, potentially impeding the participants

5http://www.visceral.eu/, as of 01 July 2017

6https://azure.microsoft.com/en-us/, as of 01 July 2017

(26)

2.1. VISCERAL PROJECT 11

from training their algorithms with test images [178]. This approach does not overcome the obstacle of objectively testing the algorithms in very large data sets since the evaluation time is limited.

The medical data and manual annotations created by expert radiologists were initially uploaded to a common cloud storage container. They were then divided into training and testing set in separate storage containers. Another storage container was reserved for the output files and evaluations of the participants algorithms. If additional images or annotations were generated during the benchmarks, they were added to the storage containers when required. Moreover, the participants algorithms were installed and exe- cuted in identical cloud computing instances and thus fully reproducible. Virtual machines (VMs) running on the Microsoft Azure cloud infrastructure were pre–configured with 5 operating systems, including Windows and Linux, with the same computing specifications and capabilities. Through an assigned virtual machine, each participant had access to the training data set and could install the necessary tools and libraries to run their algo- rithms. During the training phase, they could optimize their approaches. Additionally, a web management portal was used for the VISCERAL project to simplify the submission process and administrative tasks of handling the VMs. The VISCERAL registration and management system7 was created, containing all the information regarding the bench- marks (user agreement, benchmark specifications, data set information). Participants had access to the credentials for their VMs in the system’s participant dashboard. Up to the deadline of the benchmarks, the participants could easily submit their VMs for algorithm evaluation using this dashboard.

2.1.1 Medical image detection and segmentation: VISCERAL Anatomy benchmarks

All kinds of anatomical structures are visually analysed in medical images as part of the daily work from radiologists. Subtle variations in size, shape or appearance can be described as radiological signs that confirm or discard a particular diagnostic hypothe- sis. Radiologists screen through large regions in full imaging data to detect and interpret these findings. However, manual measurements and personal experience may result in intra– and inter–operator variability when interpreting medical data, particularly in dif- ficult or inconclusive cases [175, 224]. An automatic detection of the main anatomical structures shown in the images can potentially lead to a reduction in the workload of radi- ologists. Furthermore, most biomedical image analysis methods require the detection and segmentation of anatomical structures as the basis for further image analysis [226, 274].

Automatic medical image segmentation of anatomical structures promotes a more efficient image interpretation by facilitating hierarchic parsing and semantic navigation in image sets [222].

Medical data annotation challenges organised in the past few years targeting anatom- ical structure segmentation and landmark detection are shown in Table 2.1. Nevertheless,

7http://visceral.eu:8080/register/Login.xhtml,*

8Automatic Cephalometric X-Ray Landmark Detection Challenge 2014,http://www-o.ntust.edu.tw/

~cweiwang/celph/,*

9Proceedings of SHAPE 2015 Symposium, http://www.shapesymposium.org/proceedings-screen.

pdf,*

10Pancreas Segmentation from 3D Abdominal CT images,http://www.biomedicalimaging.org/2014/

program/challenges/,*

* as of 01 July 2017

(27)

Table 2.1: Challenges in medical imaging related to anatomical structure segmentation segmentation and landmark detection.

Landmark detection

Structure Modality

Lung [191] CT

Head 8 X–ray

Anatomical structure segmentation

Structure Modality

Liver [98]9 CT

Pancreas10 CT

Airway path [170] CT

Lung vessels [217] CT

Lung nodules [251] CT

Brain anatomical structures [225] MRI

Brain tumor [178] MRI

Head and neck structures11 MRI

Prostate and surrounding structures [169] MRI

Heart anatomy 12 MRI

Heart motion tracking [20] US

Spine and vertebrae 13 X-ray

few challenges have addressed tasks with multi–structure segmentation [169], focusing on single organs instead [98]. Some of these challenges use only a portion of the full vol- umes from real patient scans thus simplifying the task of finding the desired structure in cropped regions (e.g. only abdomen, only thorax). The inherent spatial relations be- tween the anatomical structures and their surroundings are consequently not evaluated, potentially missing helpful information for image interpretation [166, 196].

The VISCERAL Anatomy benchmark series evaluated the performance of state–of–

the–art anatomical structure segmentation algorithms in 3D medical imaging data [128].

A per–anatomy, per–modality evaluation was performed according to the nature of partic- ipating algorithms and the attempted image analysis tasks. Participants had the freedom to choose which of the structures their algorithm should segment. Anatomical landmarks, in selected anatomical structures, can be identified in different image sequences, which is why their location is an important first step in parsing image content. Three benchmarks (Anatomy 1–3) on automated anatomy localisation and segmentation of whole–body 3D volumes have been organised. These benchmarks are among the first challenges to evaluate multi–modal medical image analysis techniques using a large amount of data annotated by radiologists. The VISCERAL Gold Corpus consists of 391 CT and MRI data sets (889 sequences in total) with 20 different organs and 53 landmarks, with a subset of thirty volumes per modality (120 volumes in total) manually annotated by radiologists for the Anatomy benchmarks. The following anatomical structures were annotated if present in a medical case: (see Fig. 2.1) right lung (light blue), left lung (blue), left kidney (brown),

11Head and Neck Auto Segmentation Challenge,http://www.imagenglab.com/wiki/mediawiki/index.

php?title=2015_MICCAI_Challenge,*

12Second Annual Data Science Bowl,https://www.kaggle.com/c/second-annual-data-science-bow,*

13Computational Methods and Clinical Applications for Spine Imaging,http://csi2015.weebly.com,*

* as of 01 July 2017

(28)

2.1. VISCERAL PROJECT 13 Table 2.2: Overview of the manually annotated VISCERAL Anatomy series Gold Corpus.

The field–of–view is defined for each modality under the FOV column. Both the in–plane resolution range and in–between plane resolution are reported in milimiters. The num- ber of volumes, annotated anatomical structures (Annotations) and located anatomical landmarks are reported per modality.

Modality FOV Contrast Resolution (mm) Annotations Landmarks CT whole–body un enhanced 0.97721.4052×3 384 530

trunk contrasted 0.60420.7932×3 387 440 MR T1w

& T2w

whole–body un enhanced 1.2502×5 305 520

abdomen contrasted 0.84021.3022×38 219 270

right kidney (light brown), liver (red), spleen (purple), urinary bladder (yellow), right psoas major muscle (pink), left psoas major muscle (light pink), left rectus abdominis muscle (light brown) and sternum (dark grey), trachea (turqoise), first lumbar vertebra (white), right rectus abdominis muscle (light orange), aorta(light red), thyroid (violet), pancreas (beige), gallbladder (yellow), right adrenal gland (light green), left adrenal glands (green). Both contrast enhanced and unenhanced CT and MR 3D scans of the whole body (wb), the trunk (CT contrast–enhanced, CTce) or abdomen (MR T1 contrast-enhanced, MRT1cefs) encouraged the development of segmentation algorithms both on high and low contrast. The data set comprises roughly the same number of male and female patients (62 male, 69 female) with an average age of 59.9 years(standard deviation of±9.79 years).

The four imaging modalities included in the Gold Corpus had all their scans taken from the same hospital during clinical practice using identical imaging protocols and the same imaging device for each modality. Whole body unenhanced CT (CTwb) scans were obtained from patients with confirmed bone marrow neoplasms to detect focal bone lesions (osteolysis). Contrast–enhanced CT scans (CTce) were acquired from patients with malig- nant lymphoma to detect pathological lymph nodes or organ affection. Whole body unen- hanced MR (MRwb) scans from patients with multiple myeloma were included in the data set. All patients’ MR series originally included coronal T1–weighted and fat-suppressed T2–weighted or STIR (short tau inversion recovery) sequence of the whole body, plus a sagittal T1–weighted and a sagittal T2–weighted sequence of the entire vertebral column.

Finally, the MRI studies of the abdomen were abdomen contrast–enhanced fat–saturated MR T1 (Ab/MRT1cefs) from oncological patients who had metastases within the ab- domen. The meta data from the distributed scans as well as face regions were anonymised so that any patient identification was removed but preserving the underlying structure of the face so that it can still be used for image analysis. Manual annotations by radiologists of the selected anatomical structure served as ‘ground truth’ for the training and testing of algorithms (see Table 2.2). When organs were not visible in the 3D scan, they were not an- notated. All the data of the Anatomy benchmarks were available only for non–commercial research after participants signed a license agreement for research. Although the main evaluation setup remained the same during the three Anatomy benchmarks, there were some updates in the data set and submission process in each of the benchmarks. All the images and annotations were available as individual anonymised NIfTI (Neuroimaging In- formatics Technology Initiative) files in a cloud storage container without any additional pre–processing from their raw DICOM format. In all three benchmarks, a clear split of training and test images was used where only the training images were accessible to the participants. They trained their algorithms and implemented an executable in their vir-

(29)

VISCERAL Anatomy Gold Corpus

CTwb CTce MRwb MRce

Anatomical structure annotations – 2D coronal slice view

Anatomical structure annotations – 3D view

Landmarks – 3D view

Figure 2.1: Examples of patient volumes in the VISCERAL Anatomy Gold Corpus with their corresponding anatomical structures and landmarks. Bone structure and body con- tour are also shown in the background for spatial reference.

(30)

2.1. VISCERAL PROJECT 15

tual machine (VM) within the VISCERAL cloud framework. At the benchmark deadline, the VM was submitted and then the organisers privately ran the algorithms on the test data.

In Anatomy1, the participants had access to a VM with 4–core CPU and 8GB RAM and a training set of 28 annotated scans (7 per modality) with their corresponding anno- tated structures. The algorithm executables were tested in 51 manually annotated patient scans (27 CT, 24 MR). For Anatomy2, the size of the training set was increased to 20 vol- umes per modality and the computation power of the participants VMs was also doubled from 4 to 8 core CPU with 16GB of RAM. A continuous evaluation system was developed for the Anatomy3 benchmark where participants could submit their algorithms iteratively, at most once a week. Although the VISCERAL project is officially finished, the Anatomy3 continuous benchmark is ongoing with a public leaderboard on the VISCERAL website where participants can choose to make their results public14.

2.1.1.1 Annotating medical data

A primary step for developing robust medical image analysis algorithms is to generate or re–use a large data set containing both patient medical scans together with high–quality expert annotations (i.e. organ segmentations, landmark localisations and lesion annota- tions). During the VISCERAL project, several manual and semi-automatic segmentation tools were evaluated looking for fast and effective 3D annotation software interfaces. It is compelling to reduce the workload and annotation time of medical doctors performing manual annotations and segmentations. The manual annotation of the patients’ 3D vol- umes is a standard procedure in radiology to highlight structures and ROIs in the medical data. However, this is a demanding and time–consuming task with potential variations in the annotations depending on the experience of the annotator [224]. Several annota- tions tools have been developed essentially for the annotation of ROIs and segmentation of anatomical structures [224, 30, 22, 272, 176]. Some of these tools are added in full medical interpretation frameworks that provide visualisation and image analysis for an integral medical image computing experience [46].

Tools developed in house are difficult to maintain and selection the best suited frame- work can reduce the time to develop new applications [18, 15]. For the selection of an annotation tool for the VISCERAL project Gold Corpus generation, the visualization and medical application frameworks available were evaluated. The selected tool should be a semi–automatic annotation technique that could reduce the time required for making manual annotations of 3D structures and points of interest. Six free-of–charge frame- works were reviewed: GeoS15[30], ITK-SNAP16 [272], ImageJ17, MeVisLab18, MITK19 and 3DSlicer20[206]. Frameworks with specific annotation functionality such as the tech- niques used in [176], or developed for a specific operating system (Osirix21) where early discarded from the group as the goal of the VISCERAL project was to annotate vari-

14Anatomy3 public Leaderboard, http://visceral.eu:8080/register/Leaderboard.xhtml, as of 01 July 2017

15http://research.microsoft.com/en-us/projects/geos, as of 01 Jul. 2017

16http://www.itksnap.org, as of 01 Jul. 2017

17http://rsb.info.nih.gov/ij, as of 01 Jul. 2017

18http://www.mevislab.de, as of 01 Jul. 2017

19http://www.mitk.org, as of 01 Jul. 2017

20http://www.slicer.org, as of 01 Jul. 2017

21http://www.osirix-viewer.com, as of 01 Jul. 2017

(31)

Table 2.3: Report on the valuation criteria for each of the frameworks or annotation tools.

+: satisfactory o: insufficient -: missing

Evaluation criteria GeoS ImageJ ITK–SNAP MeVisLab MITK 3DSlicer

Annotation on 3D volumes (CT and MRI) + + + + + +

Flexibility for different shapes + - + + + +

User–friendly usage + - o + o +

Optimal visualisation of segmentation o o + + + +

Suitable for data formats + - o + + +

Interactive manual corrections + + + + + +

Time reduction vs. manual annotation + o + - o o

Local image contrast settings + - o - + +

Output annotation format + o o - + o

Feasible tool upgrading + - - - o o

ous anatomical structures and landmarks with radiologists from different locations. In order to select the most suited annotation tool for the VISCERAL project based on their functionality in manual annotation tasks, ten criteria were defined:

• Posibility to annotate in 3D volumes

• Flexibility for different structure shapes and intensities

• User–friendly for medical annotators

• Optimal visualisation of output annotations

• Suitable for data formats

• Interactive enough for manual corrections

• Reduce time spent in manual annotations

• Able to manipulate local image contrast settings

• Output annotation format

• Tool upgrading after project has ended

An overview of the evaluation for each of the frameworks or annotation tools is summarised in Table 2.3. The final decision was taken with the collaboration from 2 radiologists testing the annotation tools and comparing their usability. The Microsoft GeoS annotation tool was selected for the organ segmentations, requiring only a few brush strokes from the user to generate 3D segmentations in patient volumes respecting their edges. Moreover, the tool has a simple annotation interface with only a few keypads needed to start using the tool for annotations. 3DSlicer was selected for the localisation of anatomical landmarks as this functionality was not yet implemented in the GeoS interface.

GeoS was developed as part of the Microsoft medical image analysis project InnerEye, which focuses on the automatic analysis of the patients’ scans. Its annotation tool GeoS uses a geodesic symmetric filter with contrast–sensitive spatial smoothness algorithm to ef- ficiently segment 3D images. The segmentation method is based on a generalised geodesic distance transform (GGDT) and has a much faster implementation than graph–cut al- gorithms. Geodesic distance is implemented as the distance between two points in an image taking into account image content such as intensity gradient. A geodesic distance

(32)

2.1. VISCERAL PROJECT 17

map is initialised from a manual soft seed mask made up of brush strokes to indicate a foreground object and the background that surrounds it. The method is able to perform contrast–sensitive image editing or processing and change its behavior depending on the local image contrast. Some image processing tasks are shared thus reducing the required computational resources.

On the other hand, 3D Slicer is a module based software with two functional modules for annotating and segmenting medical 3D volumes. The user defines the volumes to be annotated, specifies the number of structures and can add additional sub–classes. Unlike GeoS, it includes a tool for the annotators to perform freehand segmentation and mark landmark coordinates in 3D volumes, which is why it was choosen as the annotation tool for the landmarks in the VISCERAL Anatomy benchmarks.

2.1.1.2 Metric selection for anatomical structure segmentation

When selecting and comparing segmentation algorithms, a clear evaluation method- ology should asses the quality of the output segmentations and quantify their similarity to the available ground truth. There are multiple segmentation metrics that have been tested in medical literature for the validation of segmentation algorithms [239]. Each met- ric measures a particular aspect of the segmentations, with a distance or similarity score, and portrays both sensitivities and biases interconnected to these aspects. For the VIS- CERAL Anatomy benchmark series, evaluation metrics were grouped into five categories according to their identity (i.e. their nature and definition) and the relationship between them, namely:

• Spatial overlap based

• Pair–counting based

• Information theoretic based

• Probabilistic based

• Spatial distance based

By selecting a subset of representative metrics from each group, the biases associated to each group of metrics are better dealt with. Participants of the Anatomy benchmarks were presented with the mean scores for all the 21 metrics shown in Tab. 2.4 during the evaluation phase of their algorithms. Nevertheless, the results analysis was focused on their performance according to the following four metrics: the Dice coefficient (DICE), the average distance (AVD), the interclass correlation (ICC), and the adjusted Rand index (ARI). This subset of metrics selected for the analysis of the benchmark algorithms in- cludes a metric from most of the categories mentioned above and has a high correlation to the rest of the metrics from the same category. All the 21 metrics were computed with the help of an EvaluateSegmentation tool available as an open source project22. In this sec- tion, we define the four main metrics evaluated in the Anatomy segmentation benchmarks.

First, the Dice coefficient [40], also called the overlap index, is the most used metric in validating medical volume segmentations. In addition to the direct comparison between segmentations, it is common to use the Dice to measure reproducibility (repeatability)

22https://github.com/codalab/EvaluateSegmentation, as of 01 July 2017

(33)

Metric Symb References Spatial overlap based

Dice (=F1-Measure) DICE [276] [277] [142] [77] [178] [139] [2]

Jaccard index JAC [178] [77] [250] [2]

True positive rate (Sensitivity, Recall) TPR [2] [178] [139]

True negative rate (Specificity) TNR [2] [178]

False positive rate (=1-Specificity, Fallout) FPR →Specificity False negative rate (=1-Sensitivity) FNR →Sensitivity

False detection rate FDR [139]

Positive predictive value (Precision) PPV

Negative predictive value NPV

Accuracy ACC [193]

F-Measure (F1-Measure=Dice) FMS →Dice

Volumetric Similarity VS [77] [250]

Global Consistency Error GCE [250] [270]

Pair counting based

Rand Index RI [250] [270]

Adjusted Rand Index ARI [105]

Information theoretic based

Mutual Information MI [277] [142]

Variation of Information VOI [250] [270]

Probabilistic based

Interclass correlation ICC [227] [67]

Probabilistic Distance PBD [67] [77]

Cohens kappa KAP [178] [276]

Area under ROC curve AUC [277]

Spatial distance based

Hausdorff distance HD [178] [67] [77] [139]

Average distance AVD [178] [139] [50]

Mahalanobis Distance MHD [193]

Table 2.4: Overview of the metrics implemented for the VISCERAL Anatomy benchmarks.

The symbols in the second column are used to denote the metrics throughout the paper.

The column ‘Reference’ shows papers where the corresponding metric has been used in the evaluation of medical volume segmentation.

Références

Documents relatifs

De nombreux gènes, impliqués dans les principales voies dérégulées dans le GBM sont soumis à une régulation épigénétique.. Les mécanismes de régulation du cycle cellulaire sont

One network may hide others – toward a comprehensive scenario for neogene superimposed valley networks along the English Channel.. The English Channel has been an area of

The separation of production and respira- tion processes into upper and lower water column permits higher net carbon fixation in the surface layer of the north- ern region, while

- in sampling of germplasm, using the tools developed in the current project - but also in association mapping analyses that should follow the sampling step • a contact person at WUR

hesperus nymphs (instead of act- ing directly on fruit damage), greater variability and thus lower probability of rejecting the null hy- pothesis is e:-'1>ected.However, there is

Since it has been clearly documented that autologous peripheral blood stem cells lead to a faster and more complete recovery, peripheral blood stem cells are preferred to

As vague information exists almost everywhere in the whole clinical decision making process, the interactions between organisational and technical factors should be