• Aucun résultat trouvé

Algorithm optimization for signature discovery: case of Breast Cancers

N/A
N/A
Protected

Academic year: 2021

Partager "Algorithm optimization for signature discovery: case of Breast Cancers"

Copied!
64
0
0

Texte intégral

(1)
(2)

(3)

(4)

Breast Cancer

Luminal A (~40%)

Luminal B (~20%)

Triple-negative

basal-like (15-20%)

HER2-enriched

(10-15%)

Normal-like

Source: ER+ and PR+ HER2-Low-levels Ki-67 Grow slowly Best prognosis ER+, PR+ HER2+, HER2-High-levels Ki-67 Grow slightly faster Worse prognosis ER-,

PR-

HER2-BRCA1 gene mutations

Younger and African-American women ER-,

PR-HER2+

Grow faster than luminal cancers Worse prognosis ER+, PR+ HER2-Low-levels Ki-67 Good prognosis 4

(5)

(6)

6

(7)

Random Forest

SVM

Deep Learning

Unsupervised clustering

Genetic Algorithm

(8)
(9)
(10)

Observation

Decision Threshold

≥ 0.5

< 0.5

1 tree = Multiple decisions

(11)

Training set / OOB set

miRNA1 miRNA2 … miRNAn

sample1 sample2 samplem Group Healthy Tumor . . . Healthy

and

miRNA2 miRNA5 miRNA3

(12)

Training set / OOB set Training set / OOB set Training set / OOB set

miRNA1 miRNA2 … miRNAn

sample1 sample2 samplem Group Healthy Tumor . . . Healthy

and

1 Forest = Multiple trees

(13)
(14)

Unknown Sample

Multiple Trees prediction procedure

Healthy

Tumor

Healthy

(15)

Unknown Sample

Healthy

Tumor

Healthy

(16)

Towards the best model

Multiple models are built for the same classification

or prediction task.

Prediction obtained on a model can assess it’s

classification power: the “AUC”.

High AUC value = Better prediction.

(17)

Profiling cohort (n=86)

41 Primary Breast Cancer PBC

45 Controls

Validation cohort (n=196)

108 PBC

88 Controls

# miRNAs: 188

25 important miRNAs

#combinations = 2^25 -1

~ 33M

Best Signature:

miR-16, let-7d, miR-103,

107, 148a, let-7i,

miR-19b, and miR-22*

Best signature performances

AUC = 0.81

Sensitivity = 91%

Specificity = 49%

The screening tool can be used as a complementary tool to the mammography test

to get a best diagnosis test of Breast Cancer

(18)

[P. Freres et al. 2015]

~ 20 hours

~ 2-3 months

(19)

Runtime improvement

Best AUC value

Best specificity

(20)

Runtime improvement

Best AUC value

Best specificity

Best sensitivity

(21)

(22)
(23)
(24)

(25)

j

(26)

26

j

(27)

List of 25 most important variables

coming from the FS process

(28)

Variable (set

of variables)

Most

important

PCs

Variability

of the data

value

AUC

Loadings (contribScore)

capture

Correlation

?

(29)

(30)

#3

00

#5

00

0

30

(31)

#3

00

#5

00

(32)

21 0.84 R = 0.78 21 0.84 R = 0.73

#300

#5000

32

(33)

(34)
(35)
(36)

(37)

(38)

prpe

Importance

38

all

v

(39)

(40)

List of 25 most important variables

coming from the FS process

(41)

Variable (set

of variables)

of the data

Variability

AUC

value

PRPE (prpeScore)

capture

Correlation

(42)

(43)
(44)

44

5400

(45)

300 500 1K 2K

3K 4K 5K 6K

(46)
(47)
(48)

4477

(74%)

(26%)

923

PRPE_Score >= 0.0475

(49)

Linear model Pearson correlation 300K Linear model Pearson correlation 300K

(50)

Without applying the PRPE-based filter

Using the PRPE-based filter

Using of the same 300k combinations and the same random partitions

PRPE-based filtering method finds the same most performant combinations with the same

AUC values

(51)

Method

Total

#combinations

#processed

combinations

#eliminated

combinations

Processing timeϯ

(min)

Exhaustive*

300k

300k

0

4578

PRPE-based

300k

42074

257926

675

* The method used in [P. Freres et al. 2015]

Ϯ Using 1000 jobs launched as parallel tasks on Lemaitre2 (CECI’s cluster), 50 jobs are allowed to be run in parallel. Each single job deals with 300 combinations.

(52)

(53)

(54)

54

(55)

(56)

Thanks for your attention

(57)

((machine learning AND breast cancer) AND

(diagnosis OR prognosis OR prediction)) ((machine learning AND breast cancer AND circulating miRNA) AND

N

um

be

r o

f p

ap

er

s

(*

)

(58)

Training set / OOB set Training set / OOB set Training set / OOB set

Random selection (replacement)

k <= n variables (at each nodes)

miRNA1 miRNA2 … miRNAn

sample1 sample2 samplem Group Healthy Tumor . . . Healthy

and

58

(59)

Training set / OOB set Training set / OOB set Training set / OOB set

Random selection (replacement)

k <= n variables (at each nodes)

miRNA1 miRNA2 … miRNAn

sample1 sample2 samplem Group Healthy Tumor . . . Healthy

and

(60)

Training set / OOB set Training set / OOB set Training set / OOB set

Random selection (replacement)

k <= n variables (at each nodes)

miRNA1 miRNA2 … miRNAn

sample1 sample2 samplem Group Healthy Tumor . . . Healthy

and

60

(61)
(62)

Tumor

Healthy

Vector of Classes / vector of probabilities

(63)

. . .

Model 1

Model 2

Model p

Mean AUC values

Mean specificity values

Mean sensitivity values

(64)

Explain the GINI index and the Information

Gain, and the MDA and MDG

Some method of feature selection

Références

Documents relatifs

Addis Ababa, 21 November, 2013 - The Steering Committee of the Joint Secretariat of the tripartite partnership of the African Union Commission, UN Economic Commission for Africa

Addis Ababa, 03 December 2009 (ECA) - Senior officials from the capitals of development partners and policy makers of the African Union Commission (AUC) and the Economic Commission

In the third section of this Chapter, we explain the methods to infer signatures from microarrays data by, first, describing the methods used to infer gene expression

Abstract—Over the last decade many studies in the gynecology literature have been investigating the performance of diagnosis models such as Univariate, Risk of Malignancy Index

In this paper we propose a novel approach to evaluate the true performance of BCI systems based on Receiver Operating Characteristic (ROC) analysis, that removes the lim- itations

In order to characterize this optimal anytime behavior, we propose a theoretical model that estimates the trade-off between runtime, given by the number of iterations performed by

In this study, we try to predict project outcomes using the results of risk assessment at the establishment of requirements stage for the projects of a specific

Les motifs de consommer peuvent également poser problème: une personne qui boit pour ressentir un effet déterminé (par exemple: se détendre, oublier ses problèmes), encourt un