• Aucun résultat trouvé

TOWARDS AN ACCURATE CANCER DIAGNOSIS MODELIZATION:COMPARISON OF RANDOM FOREST STRATEGIES

N/A
N/A
Protected

Academic year: 2021

Partager "TOWARDS AN ACCURATE CANCER DIAGNOSIS MODELIZATION:COMPARISON OF RANDOM FOREST STRATEGIES"

Copied!
1
0
0

Texte intégral

(1)

19th BeSHG meeting 15th March 2019 Liege, Belgium Abstract for the poster session

T

OWARDS AN ACCURATE CANCER DIAGNOSIS MODELIZATION

:

C

OMPARISON OF

R

ANDOM

F

OREST STRATEGIES

Debit A1,2,*, Poulet C1, Josse C1,4, A zencott CA5, Jerusalem G4, V an Steen K2, Bours V1,3

1

University of Liege, GIGA-Research, Laboratory of Human Genetics, Liege, Belgium

2

University of Liege, GIGA-Research, Medical Genomics, BIO3, Liege, Belgium

3

University Hospital (CHU), Center of Human Genetics, Liege, Belgium

4

University Hospital (CHU), Department of Medical Oncology, Liege, Belgium 5

Centre for Computational Biology (CBIO) of Mines ParisTech, Institut Curie and INSERM., Paris, France * [email protected]

Abstract

Machine learning approaches are heavily used to produce models that will one day support clinical decisions. To be reliably used as a medical decision, such diagnosis and prognosis tools have to harbor a high-level of precision. Random Forests have been already used in cancer diagnosis, prognosis, and screening. Numerous Random Forests methods have been derived from the original random forest algorithm from Breiman et al. in 2001. Nevertheless, the precision of their generated models remains unknown when facing biological data. The precision of such models can be therefore too variable to produce models with the same accuracy of classification, making them useless in daily clinics. Here, we perform an empirical comparison of Random Forest based strategies, looking for their precision in model accuracy and overall computational time. An assessment of 15 methods is carried out for the classification of paired normal - tumor patients, from 3 TCGA RNA-Seq datasets: BRCA (Breast Invasive Carcinoma), LUSC (Lung Squamous Cell Carcinoma), and THCA (Thyroid Carcinoma). Results demonstrate noteworthy differences in the precisions of the model accuracy and the overall time processing, between the strategies for one dataset, as well as between datasets for one strategy. Therefore, we highly recommend to test each random forest strategy prior to modelization. This will certainly improve the precision in model accuracy while revealing the method of choice for the candidate data.

Keywords

Références

Documents relatifs

The proposed method preserving the generally random nature of the selection of subsamplе instances in the process of building decision trees forest will increase the

For all of the above problems we ran experiments 1 to measure the effect of syllabus choice on speed of learning in both the target-task and multi-task setting and the

Simulated biomass from the APSIM (Agricultural Production Systems sIMulator) sugar- cane crop model, seasonal climate prediction indices and ob- served rainfall, maximum and

Validation scores on P2 in terms of (a) KGESR (b) Kling–Gupta efficiency (KGE) (c) NSESR and (d) Nash–Sutcliffe efficiency (NSE) using calibrated parameters on P1 (CALIB ON

The hyperparameter K denotes the number of features randomly selected at each node during the tree induction process. It must be set with an integer in the interval [1 ..M], where

combining the view-specific dissimilarities with a dynamic combination, for which the views are solicited differently from one instance to predict to another; this dynamic

The two-dimensional aspect of the problem of integration planning has been presented (stub minimization and testing resource allocation) as well as a model adapted to a comparison

Manon Caubet, Alexis Messant, Ghislain Girot, Blandine Lemercier, Guillaume Martelet, Christian Walter, Dominique Arrouays, Nicolas Saby,.. Anne