• Aucun résultat trouvé

Principal Component Analysis (PCA)

N/A
N/A
Protected

Academic year: 2022

Partager "Principal Component Analysis (PCA)"

Copied!
76
0
0

Texte intégral

(1)

Principal Component Analysis (PCA)

Master MAS, Université de Bordeaux 18 septembre 2019

(2)

Introduction

The aim is to explorenumerical data.

Example : 8 mineral waters described on 13 sensory descriptors.

## bitter sweet acid salted alcaline

## St Yorre 3.4 3.1 2.9 6.4 4.8

## Badoit 3.8 2.6 2.7 4.7 4.5

## Vichy 2.9 2.9 2.1 6.0 5.0

## Quézac 3.9 2.6 3.8 4.7 4.3

## Arvie 3.1 3.2 3.0 5.2 5.0

## Chateauneuf 3.7 2.8 3.0 5.2 4.6

## Salvetat 4.0 2.8 3.0 4.1 4.5

## Perrier 4.4 2.2 4.0 4.9 3.9

The rows describeobservations or individuals(the 8 mineral waters) and columns describevariables(the sensory descriptors).

The aim is to know :

I whichobservations are similar,

I quellesvariables are linked.

(3)

bitter

2.22.63.04.55.5

3.0 3.5 4.0

2.2 2.6 3.0

sweet

acid

2.53.03.54.0 4.5 5.5

salted

3.03.54.02.53.03.54.0

4.0 4.4 4.8

4.04.44.8

alcaline

(4)

One can look at :

I thedistance matrixbetween observations :

## St Yorre Badoit Vichy Quézac Arvie Chateauneuf Salvetat

## Badoit 4.1

## Vichy 7.9 4.8

## Quézac 2.9 5.3 9.7

## Arvie 3.0 1.8 5.5 4.7

## Chateauneuf 2.9 1.8 5.7 4.3 1.3

## Salvetat 4.0 1.2 5.4 4.9 1.8 1.6

## Perrier 8.2 10.6 14.7 6.2 10.1 9.9 10.3

I thecorrelation matrixbetween variables :

## bitter sweet acid salted alcaline

## bitter 1.00 -0.83 0.78 -0.67 -0.96

## sweet -0.83 1.00 -0.61 0.49 0.93

## acid 0.78 -0.61 1.00 -0.44 -0.82

## salted -0.67 0.49 -0.44 1.00 0.56

## alcaline -0.96 0.93 -0.82 0.56 1.00

(5)

It is also possible to usemultivariate descriptive statisticslike PCA in order to :

I visualize on graphicsdistances between observations or correlations between variables.

−4 −2 0 2

−3−2−10123

Distances between observations

Dim 1 (77.61%)

Dim 2 (12.48%)

St Yorre

Badoit

Vichy Quézac

Arvie Chateauneuf

Salvetat Perrier

−1.0 −0.5 0.0 0.5 1.0

−1.0−0.50.00.51.0

Correlations between variables

Dim 1 (77.61%)

Dim 2 (12.48%)

bitter

sweet acid

salted

alcaline

(6)

I Built new numerical variables "summarizing" as well as possible the original variables in order toreduce dimension.

Table:Original data

bitter sweet acid salted alcaline

St Yorre 3.4 3.1 2.9 6.4 4.8

Badoit 3.8 2.6 2.7 4.7 4.5

Vichy 2.9 2.9 2.1 6.0 5.0

Quézac 3.9 2.6 3.8 4.7 4.3

Arvie 3.1 3.2 3.0 5.2 5.0

Chateauneuf 3.7 2.8 3.0 5.2 4.6

Salvetat 4.0 2.8 3.0 4.1 4.5

Perrier 4.4 2.2 4.0 4.9 3.9

Table:Two new synthetic variables

PC1 PC2

St Yorre 1.85 1.19

Badoit -0.49 -0.64

Vichy 2.77 0.24

Quézac -1.72 0.11

Arvie 1.93 -0.48

Chateauneuf 0.09 0.00 Salvetat -0.93 -1.39

Perrier -3.49 0.97

(7)

Outline

Basic concepts

Analysis of the set of observations

Analysis of the set of variables

Interpretation of PCA results

PCA with metrics and GSVD

(8)

Basic concepts

We consider anumericaldatatable wherenobservations are described onpvariables.

1. . . j . . .p

1 .. .

.. .

i . . . xij . . .

.. .

.. . n

Some notations :

X= (xij)n×pis thenumerical data matrixwherexijRis the value of theith observation on thejthvariable.

xi=

xi1

.. . xip

Rp the description of theith observation (rowofX)

xj=

x1j

.. . xnj

Rn the description of thejth variable (columunofX).

(9)

Example : 6 patients described on 3 variables (diastolic pressure, systolic pressure and cholesterol).

load("../data/chol.rda") print(X)

## diast syst chol

## Brigitte 90 140 6.0

## Marie 60 85 5.9

## Vincent 75 135 6.1

## Alex 70 145 5.8

## Manue 85 130 5.4

## Fred 70 145 5.0

n= p= X= x3= x2=

Two sets of points.

(10)

The first set is theset of observations.

Example : the 6 patients define a set ofn= 6 points inR3.

Nuage des individus

60 65 70 75 80 85 90

5.05.25.45.65.86.06.2

80 90

100 110

120 130

140 150

diast

systchol

Brigitte

Marie

Vincent Alex

Manue

Fred

(11)

:

I Each observationiis apointxi inRp(a row ofX),

I Aweightwi is associated to each observationi. Usually : - wi=1n for randomly drawn observations.

- wi6=1n for ajusted samples, aggregated data...

A step ofpreproccessingis often applied to the data that might be :

I centeredto have columns (variables) with mean zero,

I scaledto have columns (variables) of variance 1.

(12)

Originaldata matrixX

1. . . j . . .p

1 .. .

.. .

i . . . xij . . .

.. .

.. . n

¯

x . . . ¯xj . . .

Centereddata matrixY

1. . . j . . .p

1 .. .

.. .

i . . . yij . . .

.. .

.. . n

¯y . . . 0 . . .

Here :

I x¯j=1nPn

i=1xijest is the mean of thejth variable (columnjofX),

I yij=xij¯xjis the general term of the centered data matrixY.

The columns of thecentered data matrixYhave zero mean :

¯ yj= 1

n

n

X

i=1

yij= 0.

(13)

Example : the set of 6 patients.

Originaldata matrixX

## diast syst chol

## Brigitte 90 140 6.0

## Marie 60 85 5.9

## Vincent 75 135 6.1

## Alex 70 145 5.8

## Manue 85 130 5.4

## Fred 70 145 5.0

Means of the columns ofX

## diast syst chol

## 75.0 130.0 5.7

Centereddata matrixY

## diast syst chol

## Brigitte 15 10 0.3

## Marie -15 -45 0.2

## Vincent 0 5 0.4

## Alex -5 15 0.1

## Manue 10 0 -0.3

## Fred -5 15 -0.7

Means of the columns ofY

## diast syst chol

## 0 0 0

(14)

I Centering the data interprets as atranslationof the set of observations inRp.

Centered set of 6 patients

−15 −10 −5 0 5 10 15

−0.8−0.6−0.4−0.2 0.0 0.2 0.4

−50

−40

−30

−20

−10 0

10 20

diast

systchol

Brigitte

Marie

Vincent

Alex

Manue

Fred

(15)

Originaldata matrixX

1. . . j . . .p

1 .. .

.. .

i . . . xij . . .

.. .

.. . n

¯

x . . . x¯j . . . s . . . sj . . .

Standardizeddata matrixZ

1. . . j . . .p

1 .. .

.. .

i . . . zij . . .

.. .

.. . n

¯

z . . . 0 . . .

s . . . 1 . . .

Here :

I sj2=1nPn

i=1(xij¯xj)2is the variance of thejth variable (columnj ofX),

I zij=xij−¯x

j

sj is the general term of the standardized data matrixZ.

The columns of thestandardized data matrixZhave a mean equal to 0 and a variance equal to 1 :

¯ zj= 1

n

n

X

i=1

zij= 0,var(zj) = 1 n

n

X

i=1

(zij¯zj)2= 1.

(16)

Example : the set of 6 patients.

Originaldata matrixX

## diast syst chol

## Brigitte 90 140 6.0

## Marie 60 85 5.9

## Vincent 75 135 6.1

## Alex 70 145 5.8

## Manue 85 130 5.4

## Fred 70 145 5.0

Means and sd of the columns ofX

## diast syst chol

## mean 75 130.0 5.700

## sd 10 20.8 0.383

Standardizeddata matrixZ

## diast syst chol

## Brigitte 1.5 0.48 0.78

## Marie -1.5 -2.16 0.52

## Vincent 0.0 0.24 1.04

## Alex -0.5 0.72 0.26

## Manue 1.0 0.00 -0.78

## Fred -0.5 0.72 -1.83

Means and sd of the columns ofZ

## diast syst chol

## 0 0 0

## diast syst chol

## 1 1 1

(17)

I Standardization (centering and scaling) interprets as a translation and a normalisationof the set of observations inRp.

Nuage centré−réduit des 6 individus

−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5

−2.0−1.5−1.0−0.5 0.0 0.5 1.0 1.5

−2.5

−2.0

−1.5

−1.0

−0.5 0.0

0.5 1.0

diast

syst

chol

Brigitte

Marie

Vincent Alex

Manue

Fred

(18)

In summary, three datasets of the same observations.

original data

−20 0 20 40 60 80 100−2 0 2 4 6 8

−50 0

50 100

150

diast

syst

chol

centered data

−20 0 20 40 60 80 100−2 0 2 4 6 8

−50 0

50 100

150

diast

syst

chol

standardized data

−20 0 20 40 60 80 100−2 0 2 4 6 8

−50 0 50

100 150

diast

syst

chol

I Centeringdo not change the distancesbetween the observations : d2(xi,xi0) =d2(yi,yi0).

I Standardizationchanges the distancesbetween the observations : d2(xi,xi0)6=d2(zi,zi0).

(19)

Proximity between two observationscan be measured with theEuclidean distance.

I The Euclidean distance between two observationsi andi0(two rows ofX) is :

d2(xi,xi0) =

p

X

j=1

(xijxi0j)2.

I When data are standardized, the Euclidean distance between two observationsi andi0(two rows ofZ) is :

d2(zi,zi0) =

p

X

j=1

1

sj2(xijxi0j)2.

It means :

I If variables (columns ofX) are measured ondifferent scales, variables with larger variance are more important than variables with smaller variance when

performing the Euclidean distance.

I Standardizing the data gives thesame importanceto all the variables when performing the Euclidean distance.

(20)

Example :distance between Brigitte and Marie Original data (X) :

## diast syst chol

## Brigitte 90 140 6.0

## Marie 60 85 5.9

## Vincent 75 135 6.1

## Alex 70 145 5.8

## Manue 85 130 5.4

## Fred 70 145 5.0

Mean and sd of the columns :

## diast syst chol

## mean 75 130.0 5.700

## sd 10 20.8 0.383

Standardized data (Z)

## diast syst chol

## Brigitte 1.5 0.48 0.78

## Marie -1.5 -2.16 0.52

## Vincent 0.0 0.24 1.04

## Alex -0.5 0.72 0.26

## Manue 1.0 0.00 -0.78

## Fred -0.5 0.72 -1.83

Euclidean distance between thetwo first rows ofX: d(x1,x2) =p

(9060)2+ (14085)2+ (65.9)2

=p

302+ 552+ 0.12 Euclidean distance between thetwo first rows ofZ:

d(z1,z2) =

r

1

102(9060)2+ 1

20.82(14085)2+ 1

0.3832(65.9)2

=p

(1.5 + 1.5)2+ (0.48 + 2.16)2+ (0.780.52)2

=p

32+ 2.72+ 0.262

(21)

The dispersionof the set of observations inRpis measured by theinertia.

I The inertia of thenobservations (thenrows ofX) is defined by :

I(X) =1 n

n

X

i=1

d2(xi,¯x).

I Inertia is a generalization ofthe varianceto the case of multivariate data (p variables).

I One can show that :

I(X) =

p

X

j=1

var(xj).

This means that :

I when the variables are centered,I(Y) =Pp i=1sj2,

I when the variabes are standardized,I(Z) =p.

(22)

Example :Inertia of the set of 6 patients Centered data (Y) :

## diast syst chol

## Brigitte 15 10 0.3

## Marie -15 -45 0.2

## Vincent 0 5 0.4

## Alex -5 15 0.1

## Manue 10 0 -0.3

## Fred -5 15 -0.7

Variance of the columns :

## diast syst chol

## 100.00 433.33 0.15

Standardized data (Z)

## diast syst chol

## Brigitte 1.5 0.48 0.78

## Marie -1.5 -2.16 0.52

## Vincent 0.0 0.24 1.04

## Alex -0.5 0.72 0.26

## Manue 1.0 0.00 -0.78

## Fred -0.5 0.72 -1.83

Variance of the columns :

## diast syst chol

## 1 1 1

I Inertia of the centered dataset :

I(Y) = 100 + 433.33 + 0.15

I Inertia if the standardized dataset :

I(Z) = 1 + 1 + 1 = 3

(23)

The second set of points associated with a numerical data matrix is theset of variables.

Example : the variables diastolic pressure, systolic pressure and cholesterol define a set ofp= 3 points inR6.

## Brigitte Marie Vincent Alex Manue Fred

## diast 90 60.0 75.0 70.0 85.0 70

## syst 140 85.0 135.0 145.0 130.0 145

## chol 6 5.9 6.1 5.8 5.4 5

Can’t be visualized !

(24)

I Each variablej is apointxjinRn(a columnX),

I Aweightmjis associated with each variabej. Usually :

I mj= 1in PCA,

I mj6= 1in MCA (Multiple Correspondance Analysis).

When data are centered :

I each variablej is a point denotedyjinRn(a columnY),

I we talk about theset of centered variables.

When data are standardized :

I each variablej is a point denotedzjinRn(column ofZ),

I we talk about theset of standardized variables.

(25)

Thelink between two variablesis measured by thecovarianceor thecorrelation.

To define covariance and correlation, ametricis associated withRn: N=diag(1

n, . . . ,1 n).

I The scalar product betweenxandyinRnis defined by :

<x,y>N=xTNy=1 nxTy=1

n

n

X

i=1

xiyi.

I The norm ofxinRnis then :

kxkN=

<x,x>N=

v u u t

1 n

n

X

i=1

xi2.

(26)

With this metric,the variance writes as a squared norm:

I var(xj) = 1nPn

i=1(xijx¯j)2=kyjk2

N,

I var(zj) = 1nPn

i=1(zij¯zj)2=kzjk2

N.

The set of thepstandardized variables is then on theunit ballofRnwithkzjkN= 1.

Moreoverthe covariance and the correlation write as scalar product:

I cjj0=1nPn

i=1(xij¯xj)(xij0x¯j0) =<yj,yj0>N,

I rjj0 =1nPn i=1(xij−¯x

j sj )(xij0−¯x

j0

sj0 ) =<zj,zj0>N

(27)

This leads to a simple expression of thecovariance matrix denotedC and of the correlation matrix denotedR:

I C=YTNY,

I R=ZTNZ.

Example :

Covariance matrix :

## diast syst chol

## diast 100.00 112.5 0.25

## syst 112.50 433.3 -2.17

## chol 0.25 -2.2 0.15

Correlation matrix

## diast syst chol

## diast 1.000 0.54 0.065

## syst 0.540 1.00 -0.272

## chol 0.065 -0.27 1.000

(28)

With this metric,the correlation writes as a cosine:

I rjj0 = <yj,yj

0>N

kyjkNkyj0kN =cosθN(yj,yj0),

I rjj0 =<zj,zj0 >N=cosθN(zj,zj0).

This lead to ageometrical interpretationof the correlation between variables :

I an angle of 90 degrees between two standardized variables corresponds to a null correlation (cosine equals to 0) and then to the absence of linear link,

I an angle of 0 degrees corresponds to a correlation of 1 (cosine equals to 1) and then to a positive linear link,

I an angle of 180 degrees corresponds to a correlation of -1 (cosinus equals to -1 ) and then to a negative linear link.

(29)

PCA analyses :

I either thecentered data matrixY,

I or thestandardized data matrixZ.

This lead to two different methods of PCA :

I non normalized PCA(or PCA on covariance matrix) which analysesY,

I normalized PCA(or PCA or correlation matrix) which analysesZ.

From now on,normalized PCAis considered.

(30)

Outline

Basic concepts

Analysis of the set of observations

Analysis of the set of variables

Interpretation of PCA results

PCA with metrics and GSVD

(31)

Analysis of the set of observations

Find thesubspacewhich gives thebest representationof the observations.

I Best approximation of the databy projection.

I Best representation of thevariabilityof the observations.

(32)

Example : the set of the 6 patients descibed on the 3 standardized variables.

Nuage centré−réduit

−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5

−2.0−1.5−1.0−0.5 0.0 0.5 1.0 1.5

−2.5

−2.0

−1.5

−1.0

−0.5 0.0

0.5 1.0

diast

syst

chol

Brigitte

Marie

Vincent Alex

Manue

Fred

−3 −2 −1 0 1

−2−101

Projection of the 6 patients

Dim 1 (52.69%)

Dim 2 (35.07%)

Brigitte

Marie

Vincent

Alex Manue

Fred

The aim is to findthe projection planewhich keeps as good as possible the distances between the patients i.e. their variability and then their inertia.

(33)

Projection of an observation (a point inRp) on an axis.

The coordinate of the orhogonal projection of a pointziRpon an axis ∆αwith orientation vectorvα(vTαvα= 1) is :

f=<zi,vα>=zTi vα,

Thevector of coordinatesof the projections of thenobservations is :

fα=

f

.. . f

=Zvα=

p

X

j=1

vzj.

I fαis alinear combinationof the columns ofZ.

I fαiscenteredif the columns ofZare centered.

(34)

Example : the 6 patients are the rows of the following standardized data matrix

Z=

1.50 0.48 0.78

−1.50 −2.16 0.52 0.00 0.24 1.04

−0.50 0.72 0.26 1.00 0.00 −0.78

−0.50 0.72 −1.83

Let us project the 6 "standardized" patients ontwo orthogonal axes1and ∆2with orientation vectors :

v1= 0.641

0.72

−0.265

!

, v2=

0.4433

−0.0652 0.894

!

.

(35)

The vectorsf1andf2of the coordinates of the projection of the 6 patients on ∆1and

2are :

f1=Zv1=0.641

1.5 .. .

−0.5

+0.72

0.48 .. . 0.72

−0.265

0.78 .. .

−1.82

=

1.09 .. . 0.683

f2=Zv2=0.4433

1.5 .. .

−0.5

−0.0652

0.48 .. . 0.72

−0.894

0.78 .. .

−1.82

=

1.333 .. .

−1.9

f1andf2aretwo new synthetic and centered variables.

(36)

−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5

−2.0−1.5−1.0−0.5 0.0 0.5 1.0 1.5

−2.5

−2.0

−1.5

−1.0

−0.5 0.0

0.5 1.0

diast

syst

chol

Brigitte

Marie

Vincent Alex

Manue

Fred

−2 −1 0 1

−2.0−1.5−1.0−0.50.00.51.0

f1

f2

Brigitte

Marie

Vincent

Alex Manue

Fred

In PCA the orientation vectorsv1andv2are defined tomaximize the inertia of the set of projections of the observationsand then keep as goog as possible the distances between the observations.

(37)

Axes of projection of the observations in PCA.

1is the axis with orientation vectorv1Rpwhichmaximises the varianceof then projected observations :

v1= arg max

kvk=1var(Zv)

= arg max

kvk=1vTRv where

R=1 nZTZ is thep×pcorrelation matrix.

One can show that :

I v1is theeigenvectorassociated the largest eigenvalueλ1ofR,

I The first principal component (PC)f1=Zv1iscentered: f¯1= 0,

I λ1is thevarianceof the first PC :

var(f1) =λ .

(38)

2is the axis of orientation vectorv2v1which maximised the variance of then projected observations :

v2= arg max

kvk=1,v⊥v1var(Zv).

One can show that :

I v2is theeigenvectorassociated with the second largest eigenvalueλ2ofR,

I The second principal component (PC)f2=Zv2iscentered: f¯2= 0,

I λ2is thevarianceof the second PC :

var(f2) =λ2,

I The principal componentsf1andf2are not correlated.

In the same way, we can getqr (r is the rank ofZ) orthogonal axes1, . . . , ∆q

on which observations are projected.

(39)

In summary :

1. Theeigen decompositionof the correlation matrixRis performed andqr is chosen.

2. Then×qmatrixF=ZVof theqprincipal componentsis obtained with the matrixVof theqfirst eigenvectors ofR.

I The principal componentsfα=Zvα(column ofF) are centered and of variance λα.

I The elementsfare called thefactor coordinatesor the observations or also the scoresof the observations on the principal components.

F=

1. . . α . . .q

1 .. .

.. .

i . . . f . . .

.. .

.. . n

mean . . . 0 . . .

var . . . λα . . .

(40)

Example of the 6 patients : matrixFof theq= 2 first PC

## f1 f2

## Brigitte 1.10 1.334

## Marie -2.66 -0.057

## Vincent -0.10 0.918

## Alex 0.13 -0.035

## Manue 0.85 -0.257

## Fred 0.68 -1.903

−3 −2 −1 0 1

−2−101

Projection of the 6 patients by PCA

Dim 1 (52.69%)

Dim 2 (35.07%)

Brigitte

Marie

Vincent

Alex Manue

Fred

(41)

Outline

Basic concepts

Analysis of the set of observations

Analysis of the set of variables

Interpretation of PCA results

PCA with metrics and GSVD

(42)

Analysis of the set of variables

Find thesubspacewhich gives thebest representationof the variables.

(43)

Example : the set of 3standardized variables.

3 variables on theunit ballofR6.

## Brigitte Marie Vincent Alex Manue Fred

## diast 1.5 -1.5 0.0 -0.5 1.0 -0.5

## syst 0.5 -2.2 0.2 0.7 0.0 0.7

## chol 0.8 0.5 1.0 0.3 -0.8 -1.8

−1.0 −0.5 0.0 0.5 1.0

−1.0−0.50.00.51.0

Projection of the 3 standardized variables

Dim 1 (52.69%)

Dim 2 (35.07%)

diast

syst chol

The aim is to findthe projection planewhich represents best the variables and then keeps as good as possible the angles between the variables i.e. their correlation.

(44)

Projection of a variable (a point inRn) on an axis.

The coordinate of theN-orthogonal projection of a pointzjRnon an axisGα with orientation vectoruα(uTαNuα= 1) is :

a=<zj,uα>N= (zj)TNuα,

and thevector of coordinatesof the projections of thepvariables is :

aα=

a

.. . a

=Z

TNuα

Warning :a metricNinRnis used.

I A metric inRnis an×npositive semidefinite matrix.

I Here in PCA,Nis the diagonal matrix of the weight of the observations : N=diag(w1, . . . ,wn).

I When all observations are weighted by 1n (usually by default) : N= 1

nIn.

(45)

Example : the three variables (diast, syst, chol) are columns of the following standardized data matrix

Z=

1.50 0.48 0.78

−1.50 −2.16 0.52 0.00 0.24 1.04

−0.50 0.72 0.26 1.00 0.00 −0.78

−0.50 0.72 −1.83

Let us project the 3 standardized variables ontwoN-orthogonal axesG1andG2with orientation vectors (hereN=16I6) :

u1=

0.87

−2.11

−0.08 0.10 0.67 0.54

, u2=

1.30

−0.06 0.90

−0.03

−0.25

−1.8

.

(46)

The vectorsa1anda2of coordinates of the projection of the 3 variables onG1andG2

are :

a1=ZTNu1=0.87 6

1.5 0.48 0.78

!

−2.11 6

−1.5

−2.16 0.52

!

+. . .+0.54 6

−0.5 0.72

−1.83

!

= 0.81 0.91

−0.33

!

a2=ZTNu2=1.30 6

1.5 0.48 0.78

!

−0.06 6

−1.5

−2.16 0.52

!

+. . .−1.80 6

−0.5 0.72

−1.83

!

= 0.45

−0.07 0.92

!

Références

Documents relatifs

Una delle motivazioni di questo rifiuto, addotta a più riprese, è che gli accordi internazionali stipulati con i paesi interessati del Nordafrica non consentono di

Keywords: Conditional Semivariance, Conditional Variance, DownSide Risk, Kernel Method, Nonparametric Mean

The problems as well as the objective of this realized work are divided in three parts: Extractions of the parameters of wavelets from the ultrasonic echo of the detected defect -

The problems as well as the objective of this realized work are divided in three parts: Extractions of the parameters of wavelets from the ultrasonic echo of the detected defect -

to isolate a basket with optimal mean reversion from a multivariate sample path of an asset process, while constraining the variance (or signal strength) of the resulting basket to

Three-dimensional statistical shape analysis seems to be a good tool for differentiating the renal tumors appearing in early childhood. Wilms tumors can be clearly differentiated

For the portfolios constructed using the FTSE 100 index data (portfolios of individual securities), a large number of our sparse portfolios, among the

Detection of signif- icant motifs [7] may be based on two different approaches: either by comparing the observed network with appropriately randomized networks (this requires