• Aucun résultat trouvé

Regularized Discriminant Analysis in Presence of Cellwise Contamination

N/A
N/A
Protected

Academic year: 2021

Partager "Regularized Discriminant Analysis in Presence of Cellwise Contamination"

Copied!
22
0
0

Texte intégral

(1)

Cellwise robust regularized discriminant analysis

JSM 2017

St´ephanie Aerts

University of Li`ege, Belgium

Ines Wilms

KU Leuven, Belgium

(2)

Discriminant analysis

X = {x1, . . . ,xN} of dimension p,

splitted into K groups, each having nk observations

Goal : Classify new datax

πk prior probability

Np(µk, Σk) conditional distribution

Quadratic discriminant analysis (QDA) :

max

k −(x − µk)

TΘ

k(x − µk) + log(det Θk) + 2 log πk

where Θk := Σ−1k

Linear discriminant analysis (LDA) :

(3)

Discriminant analysis

X = {x1, . . . ,xN} of dimension p,

splitted into K groups, each having nk observations

Goal : Classify new datax

πk prior probability

Np(µk, Σk) conditional distribution

Quadratic discriminant analysis (QDA) :

max

k −(x − µk)

TΘ

k(x − µk) + log(det Θk) + 2 log πk

where Θk := Σ−1k

Linear discriminant analysis (LDA) :

Homoscedasticity : Θk = Θ ∀k

(4)

Discriminant analysis

In practice, the parameters µk, Θk or Θ are often estimated by

the sample means ¯xk

the inverse of the sample covariance matrices bΣk

(5)

Example - Phoneme dataset

Hastie et al., 2009

K = 2 phonemes: aa (like in barn) or ao (like in more) p = 256: log intensity of the sound across 256 frequencies N = 1717 records of a male voice

Correct classification performance

s-LDA s-QDA

77.7 62.4

b

Σ−1k inaccurate when p/nk is large, not computable when p > nk b

Σ−1poolinaccurate when p/N is large, not computable when p > N

(6)

Example - Phoneme dataset

Hastie et al., 2009

K = 2 phonemes: aa (like in barn) or ao (like in more) p = 256: log intensity of the sound across 256 frequencies N = 1717 records of a male voice

Correct classification performance

s-LDA s-QDA

77.7 62.4

b

Σ−1k inaccurate when p/nk is large, not computable when p > nk b

(7)

Objectives

Propose a family of discriminant methods that, unlike the classical approach, are

1 computable and accurate in high dimension

2 cover the path between LDA and QDA

3 robust against cellwise outliers

(8)

1. Computable in high dimension

Graphical Lasso QDA (Xu et al., 2014)

Step 1 : Compute the sample means ¯xk and covariance matrices bΣk Step 2 : Graphical Lasso(Friedman et al, 2008)to estimate Θ1, . . . , ΘK:

arg max Θk nklog det(Θk) − nktr  ΘkΣbk  −λ1 X i 6=j |θk,ij|

Step 3 : Plug ¯x1, . . . , ¯xK and bΘ1, . . . , bΘK into the quadratic rule

(9)

QDA does not exploit similarities LDA ignores the group specificities

(10)

2. Covering path between LDA and QDA

Joint Graphical Lasso DA (Price et al., 2015)

Step 1 : Compute the sample means ¯xk and covariance matrices bΣk

Step 2 : Joint Graphical Lasso (Danaher et al, 2014)to estimate Θ1, . . . , ΘK:

max Θ1,...,ΘK K X k=1 nklog det(Θk) − nktr(ΘkΣbk) − λ1 K X k=1 X i 6=j |θk,ij| −λ2 X k<k0 X i ,j |θk,ij− θk0,ij|,

(11)

Lack of robustness

(12)

3. Robustness against cellwise outliers

Robust Joint Graphical Lasso DA

Step 1 : Compute robustmeanmk and covariance matriceSk estimates Step 2 : Joint Graphical Lasso to estimate Θ1, . . . , ΘK

max Θ1,...,ΘK K X k=1 nklog det(Θk) − nktr(ΘkSk) −λ1 K X k=1 X i 6=j |θk,ij| − λ2 X k<k0 X i ,j |θk,ij− θk0,ij|,

(13)

Rowwise and cellwise contamination

mk : vector of marginal medians

Sk : cellwise robust covariance matrices

(14)

Rowwise and cellwise contamination

mk : vector of marginal medians

(15)

Rowwise and cellwise contamination

mk : vector of marginal medians

Sk : cellwise robust covariance matrices

(16)

Cellwise robust covariance estimators

Sk =         s11 . . . s1i . . . s1p .. . ... ... si 1 . . . sij . . . sip .. . ... ... sp1 . . . spj . . . spp        

sij= dscale(Xi) dscale(Xj)corr(Xd i, Xj)

d

scale(.) : Qn-estimator(Rousseeuw and Croux,93)

d

corr(., .) : Kendall’s correlation

d corrK(Xi, Xj) = 2 n(n − 1) X l <m sign(xli− xi m)(x j l − x j m)  .

(17)

Simulation study

Setting

K = 10 groups

nk= 30

p = 30

M = 1000 training and test sets Classification Performance

Average percentage of correct classification Estimation accuracy

Average Kullback-Leibler distance :

KL( bΘ1, . . . , bΘK; Θ1, . . . , ΘK) = K X k=1 − log det( bΘkΘ −1 k ) + tr( bΘkΘ −1 k ) ! − Kp.

(18)

Uncontaminated scheme

Non-robust estimators s-LDA s-QDA GL-LDA GL-QDA JGL-DA

p = 30 CC 77.7 NA 80.5 83.0 83.5

KL 30.29 NA 21.87 40.41 5.03

Robust estimators r-LDA r-QDA rGL-LDA rGL-QDA rJGL-DA

p = 30 CC 76.1 59.7 77.4 79.7 80.1

(19)

Contaminated scheme: 5% of cellwise contamination

Correct classification percentages, p = 30

0 10 20 30 40 50 60 70 80 90 s−LD A r−LD A s−QD A r−QD A GL−LD A rGL−LD A GL−QD A rGL−QD A JGL−D A rJGL−D A

(20)

Example - Phoneme dataset

K = 2 p = 256 N = 1717

Correct classification performance

s-LDA s-QDA GL-LDA GL-QDA JGL-DA 77.7 62.4 81.4 74.9 78.4

r-LDA r-QDA rGL-LDA rGL-QDA rJGL-DA 81.1 74.7 81.7 76.0 76.7

(21)

Conclusion

The proposed discriminant methods :

1 are computable in high dimension

2 cover the path between LDA and QDA

3 are robust against cellwise outliers

4 detect rowwise and cellwise outliers

Code publicly available

http://feb.kuleuven.be/ines.wilms/software

(22)

References

S. Aerts, I. Wilms. Cellwise robust regularized discriminant analysis. FEB Research Report, KBI 1701,

C. Croux and V. ¨Ollerer. Robust high-dimensional precision matrix estimation,

Modern Multivariate and Robust Methods. Springer, 2015

P. Danaher, P. Wang, and D. Witten. The joint graphical lasso for inverse covariance estimation across multiple classes. Journal of the Royal Statistical Society, Series B, 76: 373–397, 2014.

B. Price, C. Geyer, and A. Rothman. Ridge fusion in statistical learning. Journal of Computational and Graphical Statistics, 24(2):439–454, 2015.

P. Rousseeuw and C. Croux. Alternatives to the median absolute deviation. Journal of the American Statistical Association, 88(424):1273–1283, 1993.

G. Tarr, S. M¨uller, and N.C. Weber. Robust estimation of precision matrices under

cellwise contamination. Computational Statistics and Data Analysis, 93:404–420, 2016.

B. Xu, K. Huang, King I., C. Liu, J. Sun, and N. Satoshi. Graphical lasso quadratic discriminant function and its application to character recognition. Neurocomputing, 129: 33–40, 2014.

Références

Documents relatifs

Proportion of zeros identified by the screening procedures as a function of − log 10 (λ t /λ max ) (horizontal axis) and the (log 10 of the) number of iterations (vertical axis):

Key words: Covariance matrix estimation, Cram´ er-Rao bound, maximum likelihood, K-distribution, spherically invariant random

Since both images are dark outside of the backscattering veil, it adds a bias in the ZNCC score and decreases its discrimination power (see Fig. To overcome this problem, a mask

Figure 8: Successful selection of the actual number of groups using AIC, BIC and ICL with the inductive (left) and transductive (right) AMDA according to the size of the

Left: Ratios of number of function evaluations performed in single runs to reach a precision 10 −10 close to the global optimal f value of the proposed approach over CMA-ES depending

Pang, “A comparative analysis of covariance matrix estimation in anomaly detection,” in IEEE Workshop on Hyperspectral Image and Signal Processing, 2014..

In this context, we propose a Robust Covariance Matrix Estimation Technique (RCOMET) based on Tyler’s estimate and COMET criterion for convexly structured matrices.. We prove that

Firstly, we introduce a suitable generalization of the notion of conditional Gaussian graphical model for data subject to censoring, secondly we propose a doubly penalized