• Aucun résultat trouvé

Towards an accurate cancer diagnosis modelization:Comparison of Random Forest strategies

N/A
N/A
Protected

Academic year: 2021

Partager "Towards an accurate cancer diagnosis modelization:Comparison of Random Forest strategies"

Copied!
15
0
0

Texte intégral

(1)

Towards an accurate cancer diagnosis modelization:

Comparison of Random Forest strategies

Ahmed DEBIT

10/05/2018

(2)

From Omics to clinics

Omics Clinics

~20k genes (RNA-seq) Dozen of genes (biomarkers)

Signatures of biomarkers

Robust and accurate Models (signatures)

(3)

From Omics to clinics

Omics Clinics

Dozen of genes (biomarkers)

Signatures of biomarkers

Robust and accurate Models (signatures) ~20k genes (RNA-seq)

(4)

From Omics to clinics

Omics Clinics

Dozen of genes (biomarkers)

Signatures of biomarkers

Robust and precise ~20k genes (RNA-seq)

(5)

From Omics to clinics

Clinics

Dozen of genes (biomarkers)

Signatures of biomarkers

Omics

Boosting

Unsupervised Clustering

Deep Learning

Neural Networks

Genetic Algorithm

KNN SVM LDA

Random Forest

Robust and precise Models and signatures ~20k genes (RNA-seq)

(6)

From Omics to clinics

Clinics

Dozen of genes (biomarkers)

Signatures of biomarkers

Omics

Boosting

Unsupervised Clustering

Deep Learning

Neural Networks

Genetic Algorithm KNN SVM LDA

Random Forest

randomForest randomForestSRC Ranger cForest Rborist extraTrees randomUniformForest RRF WSRF iterativeForest canonicalForest PPforest obliqueRF rotationForest randomerForest

Robust and precise Models and signatures ~20k genes (RNA-seq)

(7)

From Omics to clinics

Clinics

Dozen of genes (biomarkers)

Signatures of biomarkers

Omics

Boosting

Unsupervised Clustering

Deep Learning

Neural Networks

Genetic Algorithm KNN SVM LDA

Random Forest

randomForest randomForestSRC Ranger cForest Rborist extraTrees randomUniformForest RRF WSRF iterativeForest canonicalForest PPforest obliqueRF rotationForest randomerForest

Robust and precise Models and signatures

(8)

Comparison of RF methods

• For each signature, we’ll focus on the Coefficient of Variation of 1,250 AUCs obtained using:

• 50 resampling to build the Training and the Validation set. • 25 modeling and validations.

• Comparison of RF methods under same conditions: • Same training and the validation sets.

• Same signatures.

(9)

Comparison of RF methods

• For each signature, we’ll focus on:

• 50 resampling to build the Training and the Validation set. • 25 modeling and validations.

• Analysis of:

• Coefficient of Variation of 1,250 models & AUCs • TCGA-BRCA (RPKM): 182 samples x 9560 genes • TCGA-LUSC (RPKM): 96 samples x 9262 genes • Comparison of RF methods under same conditions:

• Same training and the validation sets. • Same signatures.

• Same computational nodes

(10)

Comparison of RF methods

• For each signature, we’ll focus on:

• 50 resampling to build the Training and the Validation set. • 25 modeling and validations.

• Analysis of:

• Coefficient of Variation of 1,250 models & AUCs • TCGA-BRCA (RPKM): 182 samples x 9560 genes • TCGA-LUSC (RPKM): 96 samples x 9262 genes • Comparison of RF methods under same conditions:

• Same training and the validation sets. • Same signatures.

• Same computational nodes

1,250 models & AUCs

(11)

LUSC dataset

Hyper Stability discriminates RF methods

R es am p le 0 1 R es am p le 0 3 R es am p le 0 5 R es am p le 0 7 R es am p le 0 9 R es am p le 1 1 R es am p le 1 3 R es am p le 1 5 R es am p le 1 7 R es am p le 1 9 R es am p le 2 1 R es am p le 2 3 R es am p le 2 5 R es am p le 2 7 R es am p le 2 9 R es am p le 3 1 R es am p le 3 3 R es am p le 3 5 R es am p le 3 7 R es am p le 3 9 R es am p le 4 1 R es am p le 4 3 R es am p le 4 5 R es am p le 4 7 R es am p le 4 9 Signature1 Signature2 Signature3 Signature4 Signature5 Signature6 Signature7 Signature8 Signature9 Signature10 Signature11 Signature12 Signature13 Signature14 Signature15 Signature16 Signature17 Signature18 Signature19 Signature20 Signature21 Coefficient of variation of pd_tidy_tumor_AUC over signature and resampling resampling si g n a tu re Best Methods R es am p le 0 1 R es am p le 0 3 R es am p le 0 5 R es am p le 0 7 R es am p le 0 9 R es am p le 1 1 R es am p le 1 3 R es am p le 1 5 R es am p le 1 7 R es am p le 1 9 R es am p le 2 1 R es am p le 2 3 R es am p le 2 5 R es am p le 2 7 R es am p le 2 9 R es am p le 3 1 R es am p le 3 3 R es am p le 3 5 R es am p le 3 7 R es am p le 3 9 R es am p le 4 1 R es am p le 4 3 R es am p le 4 5 R es am p le 4 7 R es am p le 4 9 Signature1 Signature2 Signature3 Signature4 Signature5 Signature6 Signature7 Signature8 Signature9 Signature10 Signature11 Signature12 Signature13 Signature14 Signature15 Signature16 Signature17 Signature18 Signature19 Signature20 Signature21 Coefficient of variation of pd_tidy_tumor_AUC over signature and resampling resampling si g n a tu re Worst Methods

(12)

ccf cFor

estextraTreiForees stobliqueR F PPfo restrandom Forest random Fore stSRC random UniformF orest rangerRbo ristrerf RRF wsrf 0 0.2 0.4 0.6 0.8 1 0 1 2 3 4 5 HRS HSS AUC Time Change between HRS and HSS for BRCA method H yp er  S ta b ili ty  S co re  /  A ve ra g e  A U C A ve ra g e  m o d el in g  T im e  (m s)

(13)

ccf cForestextraTre es iForestobliqueR

F PPfo restrandom Forest random ForestSR C random UniformF orest

rangerRboristrerf RRF wsrf

0 0.2 0.4 0.6 0.8 1 0 1 2 3 4 5 HRS HSS AUC Time Change between HRS and HSS for BRCA method H yp er  S ta b ili ty  S co re  /  A ve ra g e  A U C A ve ra g e  m o d el in g  T im e  (m s)

ccf cForestextraTre es iForestobliqueR

F PPfo restrandom Forest random ForestSR C random UniformF orest

rangerRboristrerf RRF wsrf

0 0.2 0.4 0.6 0.8 1 0 1 2 3 4 5 HRS HSS AUC Time Change between HRS and HSS for LUSC method H yp er  S ta b ili ty  S co re  /  A ve ra g e  A U C A ve ra g e  m o d el in g  T im e  (m s)

(14)

Conclusions

Classification of classification methods Towards robust signatures and predictions

• The AUC precision is dataset dependent

• The Methods are dataset dependent.

• Trade-off:

• AUC precision (hyper-stability) • Average AUC value

(15)

Références

Documents relatifs

The algorithm has been incremented in an Smart Investment model that simulates the human brain with reinforcement learning for fast decisions, deep learning to memo- rize

Finally, on online random forests we perform incremental learning feature-by-feature (i.e.: first we train the random forest on the first feature, then we add the second feature

The hyperparameter K denotes the number of features randomly selected at each node during the tree induction process. It must be set with an integer in the interval [1 ..M], where

As shown in the above algorithm, the main difference between Forest-RI and Forest-RK lies in that K is randomly chosen for each node of the tree leading therefore to more diverse

Inspired by Random Neural Networks, we introduce Random Neural Layer (RNL), which is compatible with other kinds of neural layers in Deep Neural Networks (DNN). The

combining the view-specific dissimilarities with a dynamic combination, for which the views are solicited differently from one instance to predict to another; this dynamic

“Echo State Queueing Networks: a combination of Reservoir Computing and Random Neural Networks”, S. Rubino, in Probability in the Engineering and Informational

We apply the random neural networks (RNN) [15], [16] developed for deep learning recently [17]–[19] to detecting network attacks using network metrics extract- ed from the