• Aucun résultat trouvé

Variable selection for correlated data in high dimension using decorrelation methods

N/A
N/A
Protected

Academic year: 2021

Partager "Variable selection for correlated data in high dimension using decorrelation methods"

Copied!
51
0
0

Texte intégral

(1)

HAL Id: hal-01310571

https://hal.archives-ouvertes.fr/hal-01310571

Submitted on 29 May 2020

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Variable selection for correlated data in high dimension using decorrelation methods

Emeline Perthame, David Causeur, Ching-Fan Sheu, Chloé Friguet

To cite this version:

Emeline Perthame, David Causeur, Ching-Fan Sheu, Chloé Friguet. Variable selection for correlated

data in high dimension using decorrelation methods. Statlearn: Challenging problems in statistical

learning, Apr 2016, Vannes, France. �hal-01310571�

(2)

Variable selection for correlated data in high dimension using decorrelation methods

Emeline Perthame INRIA, team MISTIS, Grenoble

Joint work with

David Causeur Ching-Fan Sheu Chlo´ e Friguet

Agrocampus, Rennes NCKU, Tainan, Taiwan UBS, Vannes

StatLearn, Vannes, April 2016

1 / 41

(3)

Outline

1. Introduction

2. Impact of dependence and dependence modeling

3. Disentangling signal from noise ...

• ... for a multiple testing issue

• ... for a supervised classification issue 4. Conclusion

2 / 41

(4)

The instrument: a 128-channel geodesic sensor net

• Electroencephalography (EEG) is the recording of electrical activity at scalp locations over time.

• The recorded EEG traces, which are time locked to external events, are averaged to form the event-related (brain) potentials (ERPs).

drawal) and left PFC may be associated with the processing of positive stimuli (typically associated with approach).

In addition, the ValenceLaterality interaction was qualified by a ValenceLateralityTask interaction, which indicated that the ValenceLaterality interaction was larger for the Good/Bad task (right: mean difference score for bad stimuli minus good stimuli = 2.84AV; left:M=2.38AV) than for the Abstract/

Concrete task (right: bad – goodM= 1.17AV; left: bad – goodM= 1.16AV),F(1, 16) = 4.84,P= 0.05, seeFig. 3. Thus, although the ValenceLaterality interaction was observed in both the Good/Bad and Abstract/Concrete tasks, which suggests that this interaction may reflect some degree of automatic processing, the 3- way interaction involving task indicates that the Valence Laterality interaction is not immune to reflective processing. For example, although it may be initiated automatically, it is possible

that an explicitly evaluative agenda can keep valence-specific information active in working memory (or conversely, that a nonevaluative agenda may suppress such information).

LPP latency

In addition to examining the extent of activation, an advantage of using EEG methods to study evaluative processes is the ability to examine the time course of evaluative processing. For each participant and for each condition, we computed the average onset of the frontal LPP for all electrodes in the right and left anterior scalp regions defined above. The onset was defined as the latency of the peak amplitude of the negative deflection immediately prior to the positive deflection identified as the frontal LPP. The calculated frontal LPP onset was then subjected to a 2 (laterality)2 (valence)2 (task) ANOVA. Cell means for significant effects are presented inTable 2. A main effect for laterality indicated that on average, the LPP for the right anterior region began earlier (M= 433 ms) than did the LPP on the left (M= 516 ms),F(1, 16) = 9.64,P< 0.01. Importantly, however, this main effect was qualified by a ValenceLaterality interaction. In contrast to the suggestion that negative stimuli are processed more quickly than positive stimuli for all processes, we found that, for the right frontal electrodes, the onset of the frontal LPP occurred more quickly for negative stimuli (M= 410 ms) than for positive stimuli (M= 455 ms), but that for the left frontal electrodes, the onset of the frontal LPP occurred earlier for Table 1

Mean amplitudes and standard deviations of the frontal LPP, according to task, valence, and laterality

Good/Bad task Abstract/Concrete task Left PFC Right PFC Left PFC Right PFC Bad

stimuli

0.55 (3.1)AV 3.73 (3.0)AV 0.91 (4.9)AV 2.94 (2.9)AV

Good stimuli

1.82 (2.3)AV 0.89 (3.1)AV 2.07 (3.7)AV 1.77 (3.5)AV

Fig. 1. Electrode locations comprising the right and left anterior scalp regions where the frontal late positive potential (LPP) was recorded.

W.A. Cunningham et al. / NeuroImage 28 (2005) 827 – 834 830

3 / 41

(5)

Auditory oddball experiment

A very commonly used experimental task

• Two auditory stimuli are presented to subjects

− A stimulus (500Hz) occurring frequently

− A stimulus (1000Hz) occurring infrequently

• ERPs are recorded on a 400 ms interval after the onset.

Motivations

• Auditory evoked potential (AEP): elicited by auditory stimulus

• Mismatch negativity (MMN): elicited by any change in the stimulus (odd/frequent)

• AEP and MMN are electrophysiological marker candidates for psychiatric disorders such as schizophrenia

4 / 41

(6)

ERP curves

0 100 200 300 400

−20 −10 0 10 20

Auditory ERP data − Kaohsiung Medical University

Time (ms)

ERP Amplitude ( µ V )

Hz500 Hz1000 Raw ERP curves for 13 subjects − Channel FZ

→ Signal detection: is there any difference between the two conditions ?

→ Signal identification: when does the difference occur ?

5 / 41

(7)

Linear model framework for ERP curves

At time t for subject i in condition j

• Multivariate analysis of variance model Y ijt = µ t + α it + γ jt + ε ijt

• Functional analysis of variance model Y ijt =

S

X

s=1

m s ϕ s (t ) +

S

X

s=1

a is ϕ s (t ) +

S

X

s=1

g js ϕ s (t ) + ε ijt

where ϕ s (.), s = 1, . . . , S are B-splines.

6 / 41

(8)

Linear model framework for ERP curves

At time t for subject i in condition j

Y ijt = µ t + α it + γ jt + ε ijt

Signal detection

• Is there any difference between the two conditions ? H 0 : for t = 1, . . . , T and j = 1, 2, γ jt = 0

• Is it relevant to predict the label from ERP curves ?

→ High dimension: need for variable selection Signal identification

For t = 1, . . . , T , H 0t : for j = 1, 2, γ jt = 0

6 / 41

(9)

Some approaches

Detection

• F-test for multivariate (or functional) ANOVA 1

• Optimal detection (Higher Criticism 2 ) Supervised classification

• Ignoring correlations: Naive approaches 3

• Introducing sparsity: Lasso, Sparse LDA 4 Identification

• FDR controlling: Benjamini-Hochberg ...

→ Efficient under independence

1. Bugli and Lambert, 2006, Stat Med 2. Donoho and Jin, 2004, AOS

3. Bickel and Levina, 2004, Bernoulli ; Tibshirani et al., 2003, Stat Sc 4. Tibshirani, 1996, JRSS ; Clemmensen et al., 2011, Technometrics

7 / 41

(10)

Guthrie-Buchwald procedure 5

• Assumes an auto-regressive process with auto-correlation ρ

• Distribution of L ρ under the null

L ρ = # { t , p t ≤ α }

where (p 1 , . . . , p T ) are p-values and α is a preset level

• A time interval is rejected if it is significant at the preset level and longer than usual time intervals

5. Guthrie and Buchwald, 1991, Psychophysiology

8 / 41

(11)

Strong and complex temporal dependence structure

→ Dependence affects the stability of selection procedures

9 / 41

(12)

Outline

1. Introduction

2. Impact of dependence and dependence modeling

3. Disentangling signal from noise ...

• ... for a multiple testing issue

• ... for a supervised classification issue 4. Conclusion

10 / 41

(13)

Rare and Weak paradigm 6

• Two components mixture for test statistics T = µ + ε, ε ∼ N (0, I T )

• Where signal is

− Rare

η = T −β , β ∈ ( 1 2 , 1)

− Weak

A = p

2r log(T ), r ∈ (0, 1)

6. Donoho and Jin, 2004, AOS ; 2008, PNAS

11 / 41

(14)

Phase diagram under independence 7

• Signal is detectable when r > ρ (β) : ρ D (β) =

( β − 1 2 if 1 2 < β ≤ 3 4 (1 − √

1 − β) 2 if 3 4 < β < 1.

0.5 0.6 0.7 0.8 0.9 1.0

0.0 0.2 0.4 0.6 0.8 1.0

β

r

Detectable

Undetectable Type I + Type II error rates of LRT −→

T→+∞

0

7. Ingster, 1999, Math Meth of Stat ; Donoho and Jin, 2004, AOS 12 / 41

(15)

Impact of dependence - Signal identification

0 200 400 600 800 1000

0 2 4 6 8 10 12

Time

T rue signal

r ≈ 0.4 , β =0.51

• Independence and ERP time dependence pattern

• 1000 datasets for each amplitude

• Benjamini Hochberg correction

13 / 41

(16)

Impact of dependence - Signal identification

0.5 0.6 0.7 0.8 0.9 1.0

0.00.20.40.60.81.0

β

r

Detectable

Undetectacle

• Independence and ERP time dependence pattern

• 1000 datasets for each amplitude

• Benjamini Hochberg correction

13 / 41

(17)

Impact of dependence - Signal identification

0 2 4 6 8 10 12

0.0 0.1 0.2 0.3 0.4 0.5

Peak amplitude

False Disco ver y Rate

BH−Independent case BH−Dependent case

0 2 4 6 8 10 12

0.0 0.2 0.4 0.6 0.8 1.0

Peak amplitude

Probability of no rejection

BH−Independent case BH−Dependent case

• Instability of multiple testing procedures FDR = pFDR(1-PNR)

14 / 41

(18)

Impact of dependence - Variable selection

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●

●●●

●●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

0 100 200 300 400 500

−5 0 5 10 15 20

variables' index

β

log P(Y = 2 | X )

P(Y = 1 | X ) = β 0 + β 0 x

• Independence and ERP time dependence pattern

• 1000 datasets for each dependence structure

• Variable selection performed by Lasso 8

8. glmnet R package, Friedman et al., 2010, JSS

15 / 41

(19)

Impact of dependence - Variable selection

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●

●●●

●●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

0 100 200 300 400 500

−5 0 5 10 15 20

variables' index

β

1

3 2 4

• Predictor X t is assessed by its rank r t deduced from its regression coefficient

• Relevance of a selected set S is given by the mean rank in S : r S = #S 1 X

t ∈S

r t

16 / 41

(20)

Impact of dependence - Variable selection

Under independence

Mean ranks among selected features

Density

0 20 40 60 80 100

0.000.050.100.150.200.250.300.35

Under dependence

Mean ranks among selected features

Density

0 20 40 60 80 100

0.000.020.040.060.08

• Relevance: the most predictive variables are not selected under dependence

• Stability: selected subsets are not reproducible

17 / 41

(21)

Impact of dependence - Improving stability

• Bootstrap

− Bolasso 9

− Stability selection 10

• Dependence modeling

− Surrogate variable analysis 11

− Latent effect adjustment after primary projection 12

− Factor analysis for multiple testing 13

9. Bach, 2008, Proceedings ICML

10. Meinshausen and B¨ uhlmann, 2010, JRSS 11. Leek and Storey, 2007, PLoS Genetics 12. Sun, Zhang and Owen, 2012, AOAS 13. Friguet, Kloareg and Causeur, 2009, JASA

18 / 41

(22)

Factor modeling of dependence

• Distribution of ERP curves

X = (X 1 , . . . , X T ) | Y = y ∼ N T (µ y , Σ)

• Latent factor modeling

X = µ y + BZ + e with e ∼ N T (0, Ψ) Ψ diagonal, rank(B ) = q,

Z ∼ N q (0, I q ),

• Decomposition of covariance matrix Σ = Ψ + BB 0

19 / 41

(23)

Signal is hidden by noise

0 200 400 600 800

−2 −1 0 1 2

Time m(s)

T est statistics

20 / 41

(24)

Signal is hidden by noise

0 200 400 600 800

−2 −1 0 1 2

Time m(s)

Signal

0 200 400 600 800

−2 −1 0 1 2

Time m(s)

Noise

20 / 41

(25)

Outline

1. Introduction

2. Impact of dependence and dependence modeling

3. Disentangling signal from noise ...

• ... for a multiple testing issue

• ... for a supervised classification issue 4. Conclusion

21 / 41

(26)

Multiple testing issue

• ERP measure at time t , for subject i, Y ijt = µ t + α it + γ jt + ε ijt

• In matrix notations

Y t = µ t + X 0 α t + X γ t + ε t

with V(ε 1 , . . . , ε T ) = Σ

• Multiple testing for t = 1, . . . , T H 0,t : γ t = 0

• Dependence among tests

22 / 41

(27)

A prior knowledge of the signal

• OLS signal estimation of γ = (γ 1 , . . . , γ T ) ˆ

γ = γ + δ with δ ∼ N (0, Σ) and e Σ e ∝ Σ

0 200 400 600 800

−1.0 0.0 0.5 1.0 1.5 2.0 2.5

Time m(s)

Estimated signal

23 / 41

(28)

A prior knowledge of the signal

• OLS signal estimation of γ = (γ 1 , . . . , γ T ) ˆ

γ = γ + δ with δ ∼ N (0, Σ) and e Σ e ∝ Σ

0 200 400 600 800

−1.0 0.0 0.5 1.0 1.5 2.0 2.5

Time m(s)

Estimated signal

T

0

T

0

23 / 41

(29)

A prior knowledge of the signal

• OLS signal estimation of γ = (γ 1 , . . . , γ T ) ˆ

γ = γ + δ with δ ∼ N (0, Σ) and e Σ e ∝ Σ

• Noise is somewhere observed without signal

δ 0 δ −0

∼ N

"

0 0

, Σ e 0,0 Σ e 0 −0,0 Σ e −0,0 Σ e −0,−0

!#

• And can be estimated elsewhere ˆ δ −0 = ˆ Σ −0,0 Σ ˆ −1 0,0 δ ˆ 0

23 / 41

(30)

A prior knowledge of the signal

• And can be estimated elsewhere ˆ δ −0 = ˆ Σ −0,0 Σ ˆ −1 0,0 δ ˆ 0

0 200 400 600 800

−1.0 0.0 0.5 1.0 1.5 2.0 2.5

Time m(s)

Estimated signal

T

0

T

0

23 / 41

(31)

A prior knowledge of the signal

• New estimation of the signal ˆ

γ new = ˆ γ − ˆ δ

0 200 400 600 800

−1.0 0.0 0.5 1.0 1.5 2.0 2.5

Time m(s)

Estimated signal

23 / 41

(32)

Iterative algorithm

• New estimation of the signal ˆ

γ new = ˆ γ − ˆ δ

• Update of residual errors ˆ ε new = Y t − (ˆ µ t + ˆ α it + ˆ γ t new )

• New estimation of covariance matrix

• Alternates estimation of signal and covariance structure

• Until convergence of test statistics

• Update of T 0

24 / 41

(33)

Choice of T 0

Prior knowledge

• ERP: psychologists may know that signal does not occur before/after some time points

• Genomics: biologists may know that some genes are not involved in a biological process

No prior knowledge

• Conservative approach

T 0 = { t , p t ≥ t 0 } where (p 1 , . . . , p T ) are p-values

25 / 41

(34)

Simulations - Adaptive factor analysis procedure

0 200 400 600 800 1000

0 1 2 3 4 5

Time

Signal amplitude

• Dependence structure of ERP experiment

• 1000 generated datasets

26 / 41

(35)

Simulations - Adaptive factor analysis procedure

Method FDR 14 TDR 15 PD 16

Benjamini-Hochberg 0.031 0.057 0.281 Benjamini-Yekutieli 0.009 0.011 0.101 Guthrie-Buchwald 0.086 0.233 0.538

SVA 0.088 0.151 0.599

LEAPP 0.151 0.304 0.847

AFA 0.034 0.498 1.000

14. False Discovery Rate 15. True Discovery Rate

16. Probability of Detecting the peak

27 / 41

(36)

Application to auditory data

0 100 200 300 400

−2 −1 0 1 2

Estimated condition effect along time

Time (ms)

Condition eff ect

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

LEAPP SVA AFA

BH

80 - 120 ms: Auditory evoked potential

100 - 200 ms: Mismatch negativity for the difference curve

28 / 41

(37)

Conclusion

• Adaptive estimation of signal and factor model parameters

• Designed for strong dependence

• Efficient multiple testing procedure

− FDR is controlled

− Good detection power

• ERP package available on CRAN 17

17. Causeur and Sheu, 2014, R package version 1.0.1

29 / 41

(38)

Outline

1. Introduction

2. Impact of dependence and dependence modeling

3. Disentangling signal from noise ...

• ... for a multiple testing issue

• ... for a supervised classification issue 4. Conclusion

30 / 41

(39)

Supervised classification issue

• Prediction of a label → Hz500 or Hz1000 frequency

• From ERP curves profiles X = (X 1 , . . . , X T ) (X | Y = y) ∼ N p (µ y , Σ)

• Among linear classification rule LR(x ) = log P(Y = 2 | X )

P(Y = 1 | X ) = β 0 + x 0 β

• The best one is Bayes’ rule β = Σ −12 − µ 1 ) β 0 = log p 2

p 1 − 0.5(µ 2 + µ 1 ) 0 Σ −12 − µ 1 )

• Theoretical misclassification rate π

31 / 41

(40)

Some estimation methods

Logistic regression

• Minimizing the deviance ( ˆ β 0 , β) = argmin ˆ β 0 − 2

n

X

i=1

log[1 + exp( − V i (β 0 + x i 0 β))]

where V i = ± 1

• High dimension

− ` 2 -penalization: Ridge 18

− ` 1 -penalization: Lasso 19

18. Hoerl and Kennard, 1970, Technometrics 19. Tibshirani, 1996, JRSS

32 / 41

(41)

Some estimation methods

Linear Discriminant Analysis

• OLS estimate → Method of moments ( ˆ β 0 , β) = argmin ˆ β 0

n

X

i=1

[V i − (β 0 + x i 0 β)] 2 , where V i = ± 1

• High dimension

− Ignoring correlations: Diagonal Discriminant Analysis (DDA) 18 , Nearest Shrunken Centroids 19

− Shrinkage Discriminant Analysis 20 (SDA)

− Sparse linear discriminant analysis 21 (SLDA)

18. Bickel and Levina, 2004, Bernoulli 19. Tibshirani et al., 2003, Stat Sc

20. Ahdesm¨ aki and Strimmer, 2010, AOAS 21. Clemmensen et al., 2011, Technometrics

32 / 41

(42)

Conditional classification rule

• Under factor model assumption (Σ = Ψ + BB 0 ) X

Z

∼ N µ y

0

,

Σ B B 0 I q

• Among classification rules linear in (x, z )

• The best one is the conditional Bayes’ classifier LR(x, z) = log P(Y = 2 | X , Z )

P(Y = 1 | X , Z ) = β 0 + (x − Bz ) 0 β with β = Ψ −1 (µ 2 − µ 1 )

β 0 = log p 2

p 1 − 0.5(µ 2 + µ 1 ) 0 Ψ −1 (µ 2 − µ 1 )

• Analytical expression of misclassification rate π Z

33 / 41

(43)

Conditional classification rule

• Bayes rule error π

• Under factor model assumption X

Z

∼ N µ y

0

,

Σ B B 0 I q

• Conditional Bayes rule error π Z

• One can show that π ≥ π Z

→ Theoretical superiority of conditional approach based on decorrelated data X e = X − BZ

33 / 41

(44)

Iterative decorrelation of data

• Estimation of µ 1 and µ 2

• Computation of centered profiles

• Estimation of factor model parameters 22 (Ψ, B)

• Decorrelation of data using generalized Thompson’s formula

˜

x = x − Bˆ ˆ z 0 Generalized Thompson’s formula

Z b = E X (Z ) = (I q +B 0 Ψ −1 B) −1 B 0 Ψ −1 x −

µ 1 P X (1) + µ 2 P X (2)

22. Friguet, Kloareg and Causeur, 2009, JASA

34 / 41

(45)

Simulations

●●●●●●●●●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●

●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●●●●●●●●●

●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●●●

●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●●●●●●●●●●●●●●●●●●●●

●●●●●

●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●

●●●●●●●

●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●●

●●

●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●●●●●●

●●●●●●●●●●●●●●●●●●●

●●

●●●●●●●●●●●●●●●●●●

●●

●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●

●●●●

●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0 200 400 600 800 1000

0 10 20 30 40

variables' index

β

• n 0 = n 1 = 13

• Various dependence structures 23

• 1000 learning datasets

• 1 testing dataset

23. Meinshausen and B¨ uhlmann, JRSS, 2010

35 / 41

(46)

Simulations - Prediction error rates

0.00.10.20.30.40.5

LASSO FA LASSO SLDA FA SLDA SDA FA SDA DDA FA DDA

→ Variable selection methods compared to their factor-adjusted version

36 / 41

(47)

Simulations - Selection accuracy

Method Nb of selected var. Accuracy

LASSO 24 13.10 62.36

Factor-adjusted LASSO 8.03 93.02

SLDA 25 10.00 62.50

FA SLDA 10.00 90.90

SDA 26 57.20 75.07

FA SDA 68.22 67.93

DDA 27 149.42 15.58

FA DDA 97.65 48.76

24. Tibshirani, 1996, JRSS ; Friedman et al., 2010, JSS 25. Clemmensen et al., 2011, Technometrics

26. Ahdesm¨ aki and Strimmer, 2010, AOAS 27. Bickel and Levina, 2004, Bernoulli

37 / 41

(48)

Conclusion

• Decorrelation method designed for prediction issues

• Preprocessing of the data which enables the use of usual selection methods

• FADA package available on CRAN 28

• Application in genomics

• Adjustment for batch effect 29

28. Perthame, Friguet and Causeur, 2014, R package version 1.2 29. Hornung, Boulesteix and Causeur, submitted

38 / 41

(49)

Outline

1. Introduction

2. Impact of dependence and dependence modeling

3. Disentangling signal from noise ...

• ... for a multiple testing issue

• ... for a supervised classification issue 4. Conclusion

39 / 41

(50)

Take home message

→ Whatever the statistical analysis, it would be efficient to account for dependence because it is a blessed situation 30

→ Accounting for dependence introduces hyper-parameters

• Risk of overfitting

• Results depend on the estimation of the dependence model

− Need for robust models

− With few parameters

− To guarantee reproducible results

30. Hall and Jin, 2010, AOS

40 / 41

(51)

References

D. Causeur and C.-F. Sheu.

ERP: Significance analysis of Event-Related Potentials data, 2014.

R package version 1.0.1.

E. Perthame, C. Friguet, and D. Causeur.

FADA: Variable selection for supervised classification in high dimension, 2014.

R package version 1.2.

E. Perthame, C. Friguet, and D. Causeur.

Stability of feature selection in classification issues for high-dimensional correlated data.

Statistics and Computing, pages 1–14, 2015.

C. Sheu, E. Perthame, D. Causeur, and Y. Lee.

Accounting for time dependence in large-scale multiple testing of event-related potential data.

AOAS, 10(1):219–245, 2016.

41 / 41

Références

Documents relatifs

- The disappearance of the first order phase transition of ( nf ) as function of increasing temperature. This behaviour may be identified with the anomalous contraction

— A theory is presented which explains why mixed-valence compounds behave as two component Fermi liquids, and why TmSe orders magnetically while the other known mixed-valence

influence of structural disorder on the valence instabilities. The low concentration alloys spectra exhibit a two-bump structure with a peak separation of about 10 eV

The photoabsorption spectra of trimethylene sulphide, ethylene sulphide, trimethylene oxide and ethylene oxide in the energy range below their ionization thresholds, to

However, as the argument for the quartet lower seems to be correct from various other experiments /3/; we should try also to fit the data by the IBM model The natural way is

As temperature rises, the phonon assisted transfer process to a state with a higher energy but a larger transfer matrix becomes more important, which is similar

— The valence fluctuation phenomena, together with the description of the metal insulator transition of SmS are formulated within the alloy analogy picture for correlated d and

Considering the lack of data concerning olfactory laterality in rodents in literature, the aim of this study was to test: (1) whether mice show nostril asymmetries in