• Aucun résultat trouvé

Regularized discriminant analysis : a cellwise robust approach

N/A
N/A
Protected

Academic year: 2021

Partager "Regularized discriminant analysis : a cellwise robust approach"

Copied!
21
0
0

Texte intégral

(1)

Cellwise robust regularized discriminant analysis

Stéphanie Aerts (University of Liège) Ines Wilms (Cornell University, KU Leuven)

(2)

Discriminant analysis

X = {x1, . . . , xN} of dimension p,

splitted into K groups, each having nk observations

Goal : Classify new data x

πk prior probability

Npk, Σk) conditional distribution

Quadratic discriminant analysis (QDA) :

max

k −(x − µk)

TΘ

k(x − µk) + log(det Θk) + 2 log πk

where Θk := Σ−1k

Linear discriminant analysis (LDA) :

(3)

Discriminant analysis

X = {x1, . . . , xN} of dimension p,

splitted into K groups, each having nk observations

Goal : Classify new data x

πk prior probability

Npk, Σk) conditional distribution

Quadratic discriminant analysis (QDA) :

max

k −(x − µk)

TΘ

k(x − µk) + log(det Θk) + 2 log πk

where Θk := Σ−1k

Linear discriminant analysis (LDA) :

(4)

Discriminant analysis

In practice, the parameters µk, Θk or Θ are estimated by

the sample means ¯xk

the inverse of the sample covariance matrices bΣk

the inverse of the sample pooled covariance matrix bΣpool=N−K1

PK

(5)

Example - Phoneme dataset

Hastie et al., 2009

N = 1717 records of a male voice

K = 2 phonemes : aa (like in barn) or ao (like in more) p = 256 : log intensity of the sound across 256 frequencies

Correct classification performance

s-LDA s-QDA

77.7 62.4

b

Σ−1k inaccurate when p/nk is large, not computable when p > nk b

(6)

Example - Phoneme dataset

Hastie et al., 2009

N = 1717 records of a male voice

K = 2 phonemes : aa (like in barn) or ao (like in more) p = 256 : log intensity of the sound across 256 frequencies

Correct classification performance

s-LDA s-QDA

77.7 62.4

b

Σ−1k inaccurate when p/nk is large, not computable when p > nk b

(7)

Objectives

Propose a family of discriminant methods that, unlike the classical approach, are

1 computable and accurate in high dimension

2 cover the path between LDA and QDA

(8)

1. Computable in high dimension

Graphical Lasso QDA (Xu et al., 2014)

Step 1 :Compute the sample means ¯xk and covariance matrices bΣk Step 2 :Graphical Lasso(Friedman et al, 2008)to estimate Θ1, . . . , ΘK :

max Θk nklog det(Θk) − nktr  ΘkΣbk  −λ1 X i 6=j |θk,ij|

Step 3 :Plug ¯x1, . . . , ¯xK and bΘ1, . . . , bΘK into the quadratic rule

Note : Use pooled covariance matrix for Graphical Lasso LDA

QDA does not exploit similarities LDA ignores group specificities

(9)

1. Computable in high dimension

Graphical Lasso QDA (Xu et al., 2014)

Step 1 :Compute the sample means ¯xk and covariance matrices bΣk Step 2 :Graphical Lasso(Friedman et al, 2008)to estimate Θ1, . . . , ΘK :

max Θk nklog det(Θk) − nktr  ΘkΣbk  −λ1 X i 6=j |θk,ij|

Step 3 :Plug ¯x1, . . . , ¯xK and bΘ1, . . . , bΘK into the quadratic rule

Note : Use pooled covariance matrix for Graphical Lasso LDA

QDA does not exploit similarities LDA ignores group specificities

(10)

2. Covering path between LDA and QDA

Joint Graphical Lasso DA (Price et al., 2015)

Step 1 :Compute the sample means ¯xk and covariance matrices bΣk

Step 2 :Joint Graphical Lasso(Danaher et al, 2014)to estimate Θ1, . . . , ΘK :

max Θ1,...,ΘK K X k=1 nklog det(Θk) − nktr(ΘkΣbk) − λ1 K X k=1 X i 6=j |θk,ij| −λ2 X k<k0 X i ,j |θk,ij− θk0,ij|,

Step 3 :Plug bΘ1, . . . , bΘK into the quadratic discriminant rule

(11)

2. Covering path between LDA and QDA

Joint Graphical Lasso DA (Price et al., 2015)

Step 1 :Compute the sample means ¯xk and covariance matrices bΣk

Step 2 :Joint Graphical Lasso(Danaher et al, 2014)to estimate Θ1, . . . , ΘK :

max Θ1,...,ΘK K X k=1 nklog det(Θk) − nktr(ΘkΣbk) − λ1 K X k=1 X i 6=j |θk,ij| −λ2 X k<k0 X i ,j |θk,ij− θk0,ij|,

Step 3 :Plug bΘ1, . . . , bΘK into the quadratic discriminant rule

(12)

3. Robustness against cellwise outliers

Robust Joint Graphical Lasso DA

Step 1 :Computerobustmk and Sk estimates

Step 2 :Joint Graphical Lasso to estimate Θ1, . . . , ΘK

max Θ1,...,ΘK K X k=1 nklog det(Θk) − nktr(ΘkSk) − λ1 K X k=1 X i 6=j |θk,ij| − λ2 X k<k0 X i ,j |θk,ij− θk0,ij|,

(13)

Robust estimators

mk : vector of marginal medians

(14)

Robust estimators

mk : vector of marginal medians

(15)

Cellwise robust covariance estimators

Sk =         s11 . . . s1i . . . s1p .. . ... ... si 1 . . . sij . . . sip .. . ... ... sp1 . . . spj . . . spp        

sij= dscale(Xi) dscale(Xj)corr(Xd

i, Xj)

d

scale(.) : Qn-estimator(Rousseeuw and Croux, 1993)

d

corr(., .) : Kendall’s correlation

d corrK(Xi, Xj) = 2 n(n − 1) X l <m sign(xli− xmi)(x j l − x j m)  .

(16)

Simulation study

Setting K = 10 groups nk = 30

p = 30

M = 1000 training and test sets

Classification Performance

Average percentage of correct classification

Estimation accuracy

Average Kullback-Leibler distance :

KL(Θb1, . . . ,ΘbK; Θ1, . . . , ΘK) = K X k=1 − log det(ΘbkΘ−1k ) + tr(ΘbkΘ−1k ) ! − Kp.

(17)

Uncontaminated scheme

Non-robust estimators s-LDA s-QDA GL-LDA GL-QDA JGL-DA p = 30 CC 77.7 NA 80.5 83.0 83.5

KL 30.29 NA 21.87 40.41 5.03

Robust estimators r-LDA r-QDA rGL-LDA rGL-QDA rJGL-DA p = 30 CC 76.1 59.7 77.4 79.7 80.1

(18)

Contaminated scheme : 5% of cellwise contamination

Correct classification percentages, p = 30

0 10 20 30 40 50 60 70 80 90 s−LD A r−LD A A r−QD A A A A A A A

(19)

Example 1 - Phoneme dataset

N = 1717 K = 2 p = 256

Correct classification performance

s-LDA s-QDA GL-LDA GL-QDA JGL-DA

77.7 62.4 81.4 74.9 78.4

r-LDA r-QDA rGL-LDA rGL-QDA rJGL-DA

81.1 74.7 81.7 76.0 76.7

(20)

Conclusion

The proposed discriminant methods :

1 are computable in high dimension

2 cover the path between LDA and QDA

3 are robust against cellwise outliers

4 detect rowwise and cellwise outliers

Code publicly available

(21)

References

S. Aerts, I. Wilms, Cellwise robust regularized discriminant analysis. Statistical Analysis and Data Mining, 10 : 436–447, 2017.

C. Croux and V. Öllerer. Robust high-dimensional precision matrix estimation, Modern Multivariate and Robust Methods. Springer, 2015

P. Danaher, P. Wang, and D. Witten. The joint graphical lasso for inverse covariance estimation across multiple classes. Journal of the Royal Statistical Society, Series B, 76 : 373–397, 2014.

B. Price, C. Geyer, and A. Rothman. Ridge fusion in statistical learning. Journal of Computational and Graphical Statistics, 24(2) :439–454, 2015.

P. Rousseeuw and C. Croux. Alternatives to the median absolute deviation. Journal of the American Statistical Association, 88(424) :1273–1283, 1993.

G. Tarr, S. Müller, and N.C. Weber. Robust estimation of precision matrices under cellwise contamination. Computational Statistics and Data Analysis, 93 :404–420, 2016.

B. Xu, K. Huang, King I., C. Liu, J. Sun, and N. Satoshi. Graphical lasso quadratic discriminant function and its application to character recognition. Neurocomputing,

Références

Documents relatifs

In the first scenario, we focus on the mean estimation, m, by comparing the ECME proposed algorithm with the minimum covariance determinant [16], the Marona’s or-

This is in particular the case in financial data analysis where few stationary monthly observations of numerous stock indexes are used to estimate the joint covariance matrix of

Employing recent results from random matrix theory, we develop a consistent estimator of a scaled version of the portfolio risk, based on which, the shrinkage intensity is

This is the case of the so-called Gaussian Locally Linear Mapping (GLLiM) model [11] that assumes Gaussian noise models and is in its unconstrained version equivalent to a

In this context, we propose a Robust Covariance Matrix Estimation Technique (RCOMET) based on Tyler’s estimate and COMET criterion for convexly structured matrices.. We prove that

Keywords: Geometric Inference, Normal estimation, Curvature estimation, Distance to a measure, Voronoi Covariance Measure, Power Diagram..

Index Terms—Adaptive signal processing, covariance matrix estima- tion, robust estimation, Majorization-Minimization, Kronecker product, low rank..

Ricci, “Recursive estimation of the covariance matrix of a compound-Gaussian process and its application to adaptive CFAR detection,” Signal Processing, IEEE Transactions on,