Towards an accurate cancer diagnosis modelization:Comparison of Random Forest strategies

(1)

Towards an accurate cancer diagnosis modelization:

Comparison of Random Forest strategies

Ahmed DEBIT

10/05/2018

(2)

From Omics to clinics

Omics _Clinics

~20k genes (RNA-seq) _{Dozen of genes (biomarkers)}

Signatures of biomarkers

Robust and accurate Models (signatures)

(3)

From Omics to clinics

Omics _Clinics

Dozen of genes (biomarkers)

Robust and accurate Models (signatures) ~20k genes (RNA-seq)

(4)

From Omics to clinics

Omics _Clinics

Robust and precise ~20k genes (RNA-seq)

(5)

From Omics to clinics

Clinics

Omics

Boosting

Unsupervised Clustering

Deep Learning

Neural Networks

Genetic Algorithm

KNN SVM LDA

Random Forest

Robust and precise Models and signatures ~20k genes (RNA-seq)

(6)

From Omics to clinics

Clinics

Omics

Boosting

Deep Learning

Neural Networks

Genetic Algorithm KNN SVM LDA

Random Forest

randomForest randomForestSRC Ranger cForest Rborist extraTrees randomUniformForest RRF WSRF iterativeForest canonicalForest PPforest obliqueRF rotationForest randomerForest

Robust and precise Models and signatures ~20k genes (RNA-seq)

(7)

From Omics to clinics

Clinics

Omics

Boosting

Deep Learning

Neural Networks

Genetic Algorithm KNN SVM LDA

Random Forest

randomForest randomForestSRC Ranger cForest Rborist extraTrees randomUniformForest RRF WSRF iterativeForest canonicalForest PPforest obliqueRF rotationForest randomerForest

Robust and precise Models and signatures

(8)

Comparison of RF methods

• For each signature, we’ll focus on the Coefficient of Variation of 1,250 AUCs obtained using:

• 50 resampling to build the Training and the Validation set. • 25 modeling and validations.

• Comparison of RF methods under same conditions: • Same training and the validation sets.

• Same signatures.

(9)

Comparison of RF methods

• For each signature, we’ll focus on:

• Analysis of:

• Coefficient of Variation of 1,250 models & AUCs • TCGA-BRCA (RPKM): 182 samples x 9560 genes • TCGA-LUSC (RPKM): 96 samples x 9262 genes • Comparison of RF methods under same conditions:

• Same training and the validation sets. • Same signatures.

• Same computational nodes

(10)

Comparison of RF methods

• For each signature, we’ll focus on:

• Analysis of:

• Coefficient of Variation of 1,250 models & AUCs • TCGA-BRCA (RPKM): 182 samples x 9560 genes • TCGA-LUSC (RPKM): 96 samples x 9262 genes • Comparison of RF methods under same conditions:

• Same training and the validation sets. • Same signatures.

• Same computational nodes

1,250 models & AUCs

(11)

LUSC dataset

Hyper Stability discriminates RF methods

R es am p le 0 1 R es am p le 0 3 R es am p le 0 5 R es am p le 0 7 R es am p le 0 9 R es am p le 1 1 R es am p le 1 3 R es am p le 1 5 R es am p le 1 7 R es am p le 1 9 R es am p le 2 1 R es am p le 2 3 R es am p le 2 5 R es am p le 2 7 R es am p le 2 9 R es am p le 3 1 R es am p le 3 3 R es am p le 3 5 R es am p le 3 7 R es am p le 3 9 R es am p le 4 1 R es am p le 4 3 R es am p le 4 5 R es am p le 4 7 R es am p le 4 9 Signature1 Signature2 Signature3 Signature4 Signature5 Signature6 Signature7 Signature8 Signature9 Signature10 Signature11 Signature12 Signature13 Signature14 Signature15 Signature16 Signature17 Signature18 Signature19 Signature20 Signature21 Coefficient of variation of pd_tidy_tumor_AUC over signature and resampling resampling si g n a tu re Best Methods R es am p le 0 1 R es am p le 0 3 R es am p le 0 5 R es am p le 0 7 R es am p le 0 9 R es am p le 1 1 R es am p le 1 3 R es am p le 1 5 R es am p le 1 7 R es am p le 1 9 R es am p le 2 1 R es am p le 2 3 R es am p le 2 5 R es am p le 2 7 R es am p le 2 9 R es am p le 3 1 R es am p le 3 3 R es am p le 3 5 R es am p le 3 7 R es am p le 3 9 R es am p le 4 1 R es am p le 4 3 R es am p le 4 5 R es am p le 4 7 R es am p le 4 9 Signature1 Signature2 Signature3 Signature4 Signature5 Signature6 Signature7 Signature8 Signature9 Signature10 Signature11 Signature12 Signature13 Signature14 Signature15 Signature16 Signature17 Signature18 Signature19 Signature20 Signature21 Coefficient of variation of pd_tidy_tumor_AUC over signature and resampling resampling si g n a tu re Worst Methods

(12)

ccf _cFor

estextraTreiFore_es stobliq_ueR F PPfo restrandom Fore_st rand_om Fore stSR_C rand_om Unifo_rmF orest rang_erRbo ristrerf RRF wsrf 0 0.2 0.4 0.6 0.8 1 0 1 2 3 4 5 HRS HSS AUC Time Change between HRS and HSS for BRCA method H yp er S ta b ili ty S co re / A ve ra g e A U C A ve ra g e m o d el in g T im e (m s)

(13)

ccf cFor_estextr_aTre es iFore_stobliq_ueR

F PPfo restrandom Fore_st rand_om Fore_stSR C rand_om Unifo_rmF orest

rang_erRbo_ristrerf RRF wsrf

0 0.2 0.4 0.6 0.8 1 0 1 2 3 4 5 HRS HSS AUC Time Change between HRS and HSS for BRCA method H yp er S ta b ili ty S co re / A ve ra g e A U C A ve ra g e m o d el in g T im e (m s)

ccf cFor_estextr_aTre es iFore_stobliq_ueR

F PPfo restrandom Fore_st rand_om Fore_stSR C rand_om Unifo_rmF orest

rang_erRbo_ristrerf RRF wsrf

0 0.2 0.4 0.6 0.8 1 0 1 2 3 4 5 HRS HSS AUC Time Change between HRS and HSS for LUSC method H yp er S ta b ili ty S co re / A ve ra g e A U C A ve ra g e m o d el in g T im e (m s)

(14)

Conclusions

Classification of classification methods Towards robust signatures and predictions

• The AUC precision is dataset dependent

• The Methods are dataset dependent.

• Trade-off:

• AUC precision (hyper-stability) • Average AUC value

(15)