• Aucun résultat trouvé

Multiway-SIR for Longitudinal Multi-table Data Integration

N/A
N/A
Protected

Academic year: 2022

Partager "Multiway-SIR for Longitudinal Multi-table Data Integration"

Copied!
56
0
0

Texte intégral

(1)

Multiway-SIR for Longitudinal Multi-table Data Integration

Nathalie Villa-Vialaneix, Valérie Sautron, Marie Chavent & Nathalie Viguerie

[email protected] http://www.nathalievilla.org

COMPSTAT - August 25th, 2016

Nathalie Villa-Vialaneix |SIR-STATIS 1/24

(2)

Sommaire

1 Background and motivation

2 Presentation of Multiway-SIR

3 Application

(3)

Sommaire

1 Background and motivation

2 Presentation of Multiway-SIR

3 Application

Nathalie Villa-Vialaneix |SIR-STATIS 3/24

(4)

A longitudinal study on weight loss induced by a low calorie diet

Data collected during EU project, 8 centers, 450 families Purpose: effect of glycemic index, protein content... on weight maintenance after a diet for obese people

Data description: at each Clinical Intervention Day (CID), measure of 15 clinical variables (weight, HDL, ...) on 135 obese women

Targeted problem: exploratory data analysis aimed at explaining the success/failure of the diet (in term of weight loss/regain)

(5)

A longitudinal study on weight loss induced by a low calorie diet

Data collected during EU project, 8 centers, 450 families Purpose: effect of glycemic index, protein content... on weight maintenance after a diet for obese people

Data description: at each Clinical Intervention Day (CID), measure of 15 clinical variables (weight, HDL, ...) on 135 obese women

Targeted problem: exploratory data analysis aimed at explaining the success/failure of the diet (in term of weight loss/regain)

Nathalie Villa-Vialaneix |SIR-STATIS 4/24

(6)

A longitudinal study on weight loss induced by a low calorie diet

Data collected during EU project, 8 centers, 450 families Purpose: effect of glycemic index, protein content... on weight maintenance after a diet for obese people

Data description: at each Clinical Intervention Day (CID), measure of 15 clinical variables (weight, HDL, ...) on 135 obese women

(7)

Data description and notations

X..t

n×p-matrix (n>p)

for t=1, . . . ,T

exploratory variables (clinical)

y

numeric vector of lengthn target variable (evolution of weight:

weight3weight1 weight1 )

Nathalie Villa-Vialaneix |SIR-STATIS 5/24

(8)

Method presented in this talk

Features:

1 longitudinal dataintegration (withT small; hereT =3)

2 multidimensional data analysis withsimple graphical representations (similar to PCA)

3 designed tohighlight differences in the numeric target variable

4 designed tohighlight differences/commonalities between variable structure (rather than individual structure) between the different time steps

(9)

Method presented in this talk

Features:

1 longitudinal dataintegration (withT small; hereT =3)

2 multidimensional data analysis withsimple graphical representations (similar to PCA)

3 designed tohighlight differences in the numeric target variable

4 designed tohighlight differences/commonalities between variable structure (rather than individual structure) between the different time steps

Nathalie Villa-Vialaneix |SIR-STATIS 6/24

(10)

Method presented in this talk

Features:

1 longitudinal dataintegration (withT small; hereT =3)

2 multidimensional data analysis withsimple graphical representations (similar to PCA)

3 designed tohighlight differences in the numeric target variable

4 designed tohighlight differences/commonalities between variable structure (rather than individual structure) between the different time steps

(11)

Method presented in this talk

Features:

1 longitudinal dataintegration (withT small; hereT =3)

2 multidimensional data analysis withsimple graphical representations (similar to PCA)

3 designed tohighlight differences in the numeric target variable

4 designed tohighlight differences/commonalities between variable structure (rather than individual structure) between the different time steps

Nathalie Villa-Vialaneix |SIR-STATIS 6/24

(12)

Sommaire

1 Background and motivation

2 Presentation of Multiway-SIR

3 Application

(13)

Factor analysis methods for multi-tables data integration

Standard methodsto analyze data such as(X..t)t=1,...,T:

Multiple Factor Analysis(MFA) [Escofier and Pagès, 2008]

: weights for tablesX..t according to a measure of their variability (using first eigenvector of the PCA);

STATISandDUAL STATIS[Lavit et al., 1994,Abdi et al., 2012]

: larger importance to the most consensual tables⇒common representation which is thebest overall compromise. STATIS: compromise according to individuals andDUAL STATIS: compromise according to

correlations between variables.

Nathalie Villa-Vialaneix |SIR-STATIS 8/24

(14)

Factor analysis methods for multi-tables data integration

Standard methodsto analyze data such as(X..t)t=1,...,T:

Multiple Factor Analysis(MFA) [Escofier and Pagès, 2008]: weights for tablesX..t according to a measure of their variability (using first eigenvector of the PCA);

STATISandDUAL STATIS[Lavit et al., 1994,Abdi et al., 2012]

: larger importance to the most consensual tables⇒common representation which is thebest overall compromise. STATIS: compromise according to individuals andDUAL STATIS: compromise according to

correlations between variables.

(15)

Factor analysis methods for multi-tables data integration

Standard methodsto analyze data such as(X..t)t=1,...,T:

Multiple Factor Analysis(MFA) [Escofier and Pagès, 2008]: weights for tablesX..t according to a measure of their variability (using first eigenvector of the PCA);

STATISandDUAL STATIS[Lavit et al., 1994,Abdi et al., 2012]: larger importance to the most consensual tables⇒common representation which is thebest overall compromise.

STATIS: compromise according to individuals andDUAL STATIS: compromise according to

correlations between variables.

Nathalie Villa-Vialaneix |SIR-STATIS 8/24

(16)

Factor analysis methods for multi-tables data integration

Standard methodsto analyze data such as(X..t)t=1,...,T:

Multiple Factor Analysis(MFA) [Escofier and Pagès, 2008]: weights for tablesX..t according to a measure of their variability (using first eigenvector of the PCA);

STATISandDUAL STATIS[Lavit et al., 1994,Abdi et al., 2012]: larger importance to the most consensual tables⇒common representation which is thebest overall compromise. STATIS: compromise according to individuals andDUAL STATIS: compromise according to

correlations between variables.

(17)

Overall description of DUAL STATIS

Data: X=











X..1

...

X..T











same number of variables, potentially different number of individuals data are supposed centered and

scaled for allt

inter-structure analysis analysis ofeC(T ×T similarity

matrix)

table weights (α1, . . . , αT)

compromise matrix Γc =PT

t=1αtΓt in whichΓt is the covariance matrix ofX..t

eig. dec. GSVD

compromise analysis

representations of variables and of individuals compromise positions or time-dependant positions

Nathalie Villa-Vialaneix |SIR-STATIS 9/24

(18)

Overall description of DUAL STATIS

Data: X=











X..1

...

X..T











same number of variables, potentially different number of individuals data are supposed centered and

scaled for allt

inter-structure analysis analysis ofeC(T×T similarity

matrix)

table weights (α1, . . . , αT)

compromise matrix Γc =PT

t=1αtΓt in whichΓt is the covariance matrix ofX..t

eig. dec. GSVD

compromise analysis

representations of variables and of individuals compromise positions or time-dependant positions

(19)

Overall description of DUAL STATIS

Data: X=











X..1

...

X..T











same number of variables, potentially different number of individuals data are supposed centered and

scaled for allt

inter-structure analysis analysis ofeC(T×T similarity

matrix)

table weights (α1, . . . , αT)

compromise matrix Γc =PT

t=1αtΓt in whichΓt is the covariance matrix ofX..t

eig. dec. GSVD

compromise analysis

representations of variables and of individuals compromise positions or time-dependant positions

Nathalie Villa-Vialaneix |SIR-STATIS 9/24

(20)

Overall description of DUAL STATIS

Data: X=











X..1

...

X..T











same number of variables, potentially different number of individuals data are supposed centered and

scaled for allt

inter-structure analysis analysis ofeC(T×T similarity

matrix)

table weights (α1, . . . , αT)

compromise matrix Γc =PT

t=1αtΓt in whichΓt is the covariance matrix ofX..t

eig. dec.

GSVD

compromise analysis

representations of variables and of individuals compromise positions or time-dependant positions

(21)

Inter-structure analysis

Notations: ∀t =1, . . . , T,Γt = 1nX>..tX..t andeΓt = Γt

tkF

Similarity between time steps:eC= (ctt0)t,t0=1,...,T with ctt0 =heΓt,eΓt0iF

(cosinus betweenΓt andΓt0). Inter-structure analysis: Solve

u1 = arg max

u=(u1,...,uT)>RT,kuk=1

XT

t=1

utt

2

F

Solution: u1is the first eigenvector ofeC⇒compromise weights:

∀t =1, . . . ,T,αt = Pu1t

t0u1t0

Γc =PT t=1αtt

Nathalie Villa-Vialaneix |SIR-STATIS 10/24

(22)

Inter-structure analysis

Notations: ∀t =1, . . . , T,Γt = 1nX>..tX..t andeΓt = Γt

tkF

Similarity between time steps:eC= (ctt0)t,t0=1,...,T with ctt0 =heΓt,eΓt0iF

(cosinus betweenΓt andΓt0).

Inter-structure analysis: Solve

u1 = arg max

u=(u1,...,uT)>RT,kuk=1

XT

t=1

utt

2

F

Solution: u1is the first eigenvector ofeC⇒compromise weights:

∀t =1, . . . ,T,αt = Pu1t

t0u1t0

Γc =PT t=1αtt

(23)

Inter-structure analysis

Notations: ∀t =1, . . . , T,Γt = 1nX>..tX..t andeΓt = Γt

tkF

Similarity between time steps:eC= (ctt0)t,t0=1,...,T with ctt0 =heΓt,eΓt0iF

(cosinus betweenΓt andΓt0).

Inter-structure analysis: Solve

u1 = arg max

u=(u1,...,uT)>RT,kuk=1

XT

t=1

utt

2

F

Solution: u1is the first eigenvector ofeC⇒compromise weights:

∀t =1, . . . ,T,αt = Pu1t

t0u1t0

Γc =PT t=1αtt

Nathalie Villa-Vialaneix |SIR-STATIS 10/24

(24)

Inter-structure analysis

Notations: ∀t =1, . . . , T,Γt = 1nX>..tX..t andeΓt = Γt

tkF

Similarity between time steps:eC= (ctt0)t,t0=1,...,T with ctt0 =heΓt,eΓt0iF

(cosinus betweenΓt andΓt0).

Inter-structure analysis: Solve

u1 = arg max

u=(u1,...,uT)>RT,kuk=1

XT

t=1

utt

2

F

Solution: u1is the first eigenvector ofeC⇒compromise weights:

∀t =1, . . . ,T,αt = Pu1t

t0u1t0

(25)

Compromise analysis

GPCAof(eX,Ip,D)witheX..t = X..t

tkF andD= 1nDiag(α1, . . . , αT)⊗In

⇔eigendecomposition ofΓc Solutionwrites:

eX=PΛQ> withP>DP=Q>Q=Ir andΛ =Diag(p

λk)k=1,...,r

Representationsof:

observations: time-specific positionson thek-th principal component: k-th column ofF=PΛ =











F1

... FT











compromise positionson thek-th principal component:k-th column ofFc =PT

t=1αtF..t

variables:compromise positions on circle of correlationsfor variablej onk-th axis:

λk

σˆj Qjk (time-specific positionscan also be derived)

Nathalie Villa-Vialaneix |SIR-STATIS 11/24

(26)

Compromise analysis

GPCAof(eX,Ip,D)witheX..t = X..t

tkF andD= 1nDiag(α1, . . . , αT)⊗In

⇔eigendecomposition ofΓc

Solutionwrites:

eX=PΛQ> withP>DP=Q>Q=Ir andΛ =Diag(p

λk)k=1,...,r

Representationsof:

observations: time-specific positionson thek-th principal component: k-th column ofF=PΛ =











F1

... FT











compromise positionson thek-th principal component:k-th column ofFc =PT

t=1αtF..t

variables:compromise positions on circle of correlationsfor variablej onk-th axis:

λk

σˆj Qjk (time-specific positionscan also be derived)

(27)

Compromise analysis

GPCAof(eX,Ip,D)witheX..t = X..t

tkF andD= 1nDiag(α1, . . . , αT)⊗In

⇔eigendecomposition ofΓc Solutionwrites:

eX=PΛQ> withP>DP=Q>Q=Ir andΛ =Diag(p

λk)k=1,...,r

Representationsof:

observations: time-specific positionson thek-th principal component: k-th column ofF=PΛ =











F1

... FT











compromise positionson thek-th principal component:k-th column ofFc =PT

t=1αtF..t

variables:compromise positions on circle of correlationsfor variablej onk-th axis:

λk

σˆj Qjk (time-specific positionscan also be derived)

Nathalie Villa-Vialaneix |SIR-STATIS 11/24

(28)

Compromise analysis

GPCAof(eX,Ip,D)witheX..t = X..t

tkF andD= 1nDiag(α1, . . . , αT)⊗In

⇔eigendecomposition ofΓc Solutionwrites:

eX=PΛQ> withP>DP=Q>Q=Ir andΛ =Diag(p

λk)k=1,...,r

Representationsof:

observations: time-specific positionson thek-th principal component:

k-th column ofF=PΛ =











F1

...

FT











compromise positionson thek-th principal component:k-th column ofFc =PT

t=1αtF..t

variables:compromise positions on circle of correlationsfor variablej onk-th axis:

λk

σˆj Qjk (time-specific positionscan also be derived)

(29)

Compromise analysis

GPCAof(eX,Ip,D)witheX..t = X..t

tkF andD= 1nDiag(α1, . . . , αT)⊗In

⇔eigendecomposition ofΓc Solutionwrites:

eX=PΛQ> withP>DP=Q>Q=Ir andΛ =Diag(p

λk)k=1,...,r

Representationsof:

observations: time-specific positionson thek-th principal component:

k-th column ofF=PΛ =











F1

...

FT











compromise positionson thek-th principal component:k-th column ofFc =PT

t=1αtF..t

variables:compromise positions on circle of correlationsfor variablej onk-th axis:

λk

ˆσj Qjk (time-specific positionscan also be derived)

Nathalie Villa-Vialaneix |SIR-STATIS 11/24

(30)

SIR Regression framework

[Li, 1991]

Y =f(X>a1, . . . , X>ad, )

ford<pandf : Rd+1→R, an arbitrary (non linear) function.

SY|X =Span{a1, . . . , ad}is:Effective Dimension Reduction (EDR) space.

Equivalence between SIR and eigendecomposition

SY|Xis included in the space spanned by the firstdΓ-orthogonal eigenvectors of theΓe, withΓ =E

h(X−E(X))TXi and Γe =E

E(Z|Y)TE(Z|Y)

forZ = Γ−1/2(X−E(X)).

(31)

SIR Regression framework

[Li, 1991]

Y =f(X>a1, . . . , X>ad, )

ford<pandf : Rd+1→R, an arbitrary (non linear) function.

SY|X =Span{a1, . . . , ad}is:Effective Dimension Reduction (EDR) space.

Equivalence between SIR and eigendecomposition

SY|Xis included in the space spanned by the firstdΓ-orthogonal eigenvectors of theΓe, withΓ =E

h(X−E(X))TXi and Γe =E

E(Z|Y)TE(Z|Y)

forZ = Γ−1/2(X−E(X)).

Nathalie Villa-Vialaneix |SIR-STATIS 12/24

(32)

SIR in practice

Estimation(whenn>p) compute¯x= 1nPn

i=1xiandΓ = 1nXT(X−¯x)

split the range ofY intoHdifferent slices: τ1, ...τH and estimate

G=







 1 nh

X

i:yi∈τh

zi









h=1,...,H

withnh =|{i : yi ∈τh}|and

Γe =G>MG withM=Diagn

1

n, . . . ,nnH

generalized eigendecompositionofΓe with normΓ

(33)

SIR in practice

Estimation(whenn>p) compute¯x= 1nPn

i=1xiandΓ = 1nXT(X−¯x)

split the range ofY intoHdifferent slices: τ1, ...τH and estimate

G=







 1 nh

X

i:yi∈τh

zi









h=1,...,H

withnh =|{i : yi ∈τh}|and

Γe =G>MG withM=Diagn

1

n, . . . ,nnH

generalized eigendecompositionofΓe with normΓ

Nathalie Villa-Vialaneix |SIR-STATIS 13/24

(34)

SIR in practice

Estimation(whenn>p) compute¯x= 1nPn

i=1xiandΓ = 1nXT(X−¯x)

split the range ofY intoHdifferent slices: τ1, ...τH and estimate

G=







 1 nh

X

i:yi∈τh

zi









h=1,...,H

withnh =|{i : yi ∈τh}|and

Γe =G>MG withM=Diagn

1

n, . . . ,nnH

generalized eigendecompositionofΓe with normΓ

(35)

Back to our problem: multiway SIR

Basic ideas:

perform DUAL STATIS analysis on center of gravity of the slices (G..t) instead of the original variables

compromise analysis is similar to finding a compromise EDR space using slices make the method similar to FDA but other estimates ofΓet (not based on slices) could be used

Nathalie Villa-Vialaneix |SIR-STATIS 14/24

(36)

Overview of multiway SIR

Data:eX=











X..1

...

X..T











 and y= (y1, . . . , yn)>

same number of variables, potentially different number of individuals

inter-structure analysis analysis ofeC(T ×T similarity

matrixbetween the covariance matrix ofG..t,Γet)

table weights (α1, . . . , αT)

compromise matrix Γe,c=PT

t=1αtΓet eig. dec.

GSVD

compromise analysis

representations of variables and of individuals compromise positions or time-dependant positions

(37)

Overview of multiway SIR

Data:eX=











X..1

...

X..T











 and y= (y1, . . . , yn)>

same number of variables, potentially different number of individuals

inter-structure analysis analysis ofeC(T×T similarity

matrixbetween the covariance matrix ofG..t,Γet)

table weights (α1, . . . , αT)

compromise matrix Γe,c=PT

t=1αtΓet eig. dec.

GSVD

compromise analysis

representations of variables and of individuals compromise positions or time-dependant positions

Nathalie Villa-Vialaneix |SIR-STATIS 15/24

(38)

Overview of multiway SIR

Data:eX=











X..1

...

X..T











 and y= (y1, . . . , yn)>

same number of variables, potentially different number of individuals

inter-structure analysis analysis ofeC(T×T similarity

matrixbetween the covariance matrix ofG..t,Γet)

table weights (α1, . . . , αT)

compromise matrix Γe,c=PT

t=1αtΓet

eig. dec. GSVD

compromise analysis

representations of variables and of individuals compromise positions or time-dependant positions

(39)

Overview of multiway SIR

Data:eX=











X..1

...

X..T











 and y= (y1, . . . , yn)>

same number of variables, potentially different number of individuals

inter-structure analysis analysis ofeC(T×T similarity

matrixbetween the covariance matrix ofG..t,Γet)

table weights (α1, . . . , αT)

compromise matrix Γe,c=PT

t=1αtΓet eig. dec.

GSVD

compromise analysis

representations of variables and of individuals compromise positions or time-dependant positions

Nathalie Villa-Vialaneix |SIR-STATIS 15/24

(40)

Inter-structure analysis

Notations:∀t =1, . . . , T,Γt = 1nX>..t(X..t1nx>t ),Z..t =

X..t1nx>t Γ−1/2t , G..t =1

nh

P

i:yi∈τhzi

h=1,...,H andΓet =G>..tM G..t andeΓet = Γ

e t

etkF.

Similarity between time steps:eC= (ctt0)t,t0=1,...,T with ctt0 =heΓet,eΓet0iF

(cosinus betweenΓet andΓet0). Inter-structure analysis: Solve

u1= arg max

u=(u1,...,uT)>RT,kuk=1

T

X

t=1

utet

2

F

Solution: u1is the first eigenvector ofeC⇒compromise weights:

∀t =1, . . . ,T,αt = Pu1t

t0u1t0

Γe,c =PT t=1αtet

(41)

Inter-structure analysis

Notations:∀t =1, . . . , T,Γt = 1nX>..t(X..t1nx>t ),Z..t =

X..t1nx>t Γ−1/2t , G..t =1

nh

P

i:yi∈τhzi

h=1,...,H andΓet =G>..tM G..t andeΓet = Γ

e t

etkF. Similarity between time steps:eC= (ctt0)t,t0=1,...,T with

ctt0 =heΓet,eΓet0iF (cosinus betweenΓet andΓet0).

Inter-structure analysis: Solve

u1= arg max

u=(u1,...,uT)>RT,kuk=1

T

X

t=1

utet

2

F

Solution: u1is the first eigenvector ofeC⇒compromise weights:

∀t =1, . . . ,T,αt = Pu1t

t0u1t0

Γe,c =PT t=1αtet

Nathalie Villa-Vialaneix |SIR-STATIS 16/24

(42)

Inter-structure analysis

Notations:∀t =1, . . . , T,Γt = 1nX>..t(X..t1nx>t ),Z..t =

X..t1nx>t Γ−1/2t , G..t =1

nh

P

i:yi∈τhzi

h=1,...,H andΓet =G>..tM G..t andeΓet = Γ

e t

etkF. Similarity between time steps:eC= (ctt0)t,t0=1,...,T with

ctt0 =heΓet,eΓet0iF (cosinus betweenΓet andΓet0).

Inter-structure analysis: Solve

u1= arg max

u=(u1,...,uT)>RT,kuk=1

T

X

t=1

utet

2

F

Solution: u1is the first eigenvector ofeC⇒compromise weights:

∀t =1, . . . ,T,αt = Pu1t

t0u1t0

Γe,c =PT t=1αtet

(43)

Inter-structure analysis

Notations:∀t =1, . . . , T,Γt = 1nX>..t(X..t1nx>t ),Z..t =

X..t1nx>t Γ−1/2t , G..t =1

nh

P

i:yi∈τhzi

h=1,...,H andΓet =G>..tM G..t andeΓet = Γ

e t

etkF. Similarity between time steps:eC= (ctt0)t,t0=1,...,T with

ctt0 =heΓet,eΓet0iF (cosinus betweenΓet andΓet0).

Inter-structure analysis: Solve

u1= arg max

u=(u1,...,uT)>RT,kuk=1

T

X

t=1

utet

2

F

Solution: u1is the first eigenvector ofeC⇒compromise weights:

∀t =1, . . . ,T,αt = Pu1t

t0u1t0

Γe,c=PT

t=1αtet

Nathalie Villa-Vialaneix |SIR-STATIS 16/24

(44)

Compromise analysis

GPCAof(Ge,Ip,D)withGe..t = √G..t

etkF andD=Diag(α1, . . . , αT)⊗M

⇔eigendecomposition ofΓe,c Solutionwrites:

Ge=PΛQ> withP>DP=Q>Q=Ir andΛ =Diag(p

λk)k=1,...,r

Representationsof:

slices: time-specific positionson thek-th principal component: k-th column ofF=PΛ =











F1

... FT











compromise positionson thek-th principal component:k-th column ofFc =PT

t=1αtF..t

variables:compromise positions on circle of correlationsfor variablej onk-th axis:

λk

σˆj Qjk (time-specific positionscan also be derived)

(45)

Compromise analysis

GPCAof(Ge,Ip,D)withGe..t = √G..t

etkF andD=Diag(α1, . . . , αT)⊗M

⇔eigendecomposition ofΓe,c

Solutionwrites:

Ge=PΛQ> withP>DP=Q>Q=Ir andΛ =Diag(p

λk)k=1,...,r

Representationsof:

slices: time-specific positionson thek-th principal component: k-th column ofF=PΛ =











F1

... FT











compromise positionson thek-th principal component:k-th column ofFc =PT

t=1αtF..t

variables:compromise positions on circle of correlationsfor variablej onk-th axis:

λk

σˆj Qjk (time-specific positionscan also be derived)

Nathalie Villa-Vialaneix |SIR-STATIS 17/24

(46)

Compromise analysis

GPCAof(Ge,Ip,D)withGe..t = √G..t

etkF andD=Diag(α1, . . . , αT)⊗M

⇔eigendecomposition ofΓe,c Solutionwrites:

Ge=PΛQ> withP>DP=Q>Q=Ir andΛ =Diag(p

λk)k=1,...,r

Representationsof:

slices: time-specific positionson thek-th principal component: k-th column ofF=PΛ =











F1

... FT











compromise positionson thek-th principal component:k-th column ofFc =PT

t=1αtF..t

variables:compromise positions on circle of correlationsfor variablej onk-th axis:

λk

σˆj Qjk (time-specific positionscan also be derived)

(47)

Compromise analysis

GPCAof(Ge,Ip,D)withGe..t = √G..t

etkF andD=Diag(α1, . . . , αT)⊗M

⇔eigendecomposition ofΓe,c Solutionwrites:

Ge=PΛQ> withP>DP=Q>Q=Ir andΛ =Diag(p

λk)k=1,...,r

Representationsof:

slices: time-specific positionson thek-th principal component: k-th column ofF=PΛ =











F1

...

FT











compromise positionson thek-th principal component:k-th column ofFc =PT

t=1αtF..t

variables:compromise positions on circle of correlationsfor variablej onk-th axis:

λk

σˆj Qjk (time-specific positionscan also be derived)

Nathalie Villa-Vialaneix |SIR-STATIS 17/24

(48)

Compromise analysis

GPCAof(Ge,Ip,D)withGe..t = √G..t

etkF andD=Diag(α1, . . . , αT)⊗M

⇔eigendecomposition ofΓe,c Solutionwrites:

Ge=PΛQ> withP>DP=Q>Q=Ir andΛ =Diag(p

λk)k=1,...,r

Representationsof:

slices: time-specific positionson thek-th principal component: k-th column ofF=PΛ =











F1

...

FT











compromise positionson thek-th principal component:k-th column ofFc =PT

t=1αtF..t

(49)

Sommaire

1 Background and motivation

2 Presentation of Multiway-SIR

3 Application

Nathalie Villa-Vialaneix |SIR-STATIS 18/24

(50)

Application of multiway SIR on Diogenes dataset I

H =5,interstructure analysis:

(51)

Application of multiway SIR on Diogenes dataset II

H =5,compromise analysis: number of components

2 components are enough but we will analyze 4 for a deeper understanding of the data.

Nathalie Villa-Vialaneix |SIR-STATIS 20/24

(52)

Application of multiway SIR on Diogenes dataset III

H =5,compromise analysis: analysis of the slices (compromise)

(53)

Application of multiway SIR on Diogenes dataset IV

H =5,compromise analysis: analysis of the variable (compromise)

Nathalie Villa-Vialaneix |SIR-STATIS 22/24

(54)

Application of multiway SIR on Diogenes dataset V

H =5,compromise analysis: longitudinal analysis

(55)

Conclusion

We have presented an exploratory analysis method based on DUAL STATIS

able to focus on a numeric variable of interest similarly to SIR

able to explain longitudinal evolutions when the number of time steps is small

Future work

an R package is under development

only valid forn≥p: regularization approach is under study to allows for the analysis ofn<pcases

Nathalie Villa-Vialaneix |SIR-STATIS 24/24

(56)

Abdi, H., Williams, L., Valentin, D., and Bennani-Dosse, M. (2012).

STATIS and DISTATIS: optimum multitable principal component analysis and three way metric multidimensional scaling.

Wiley Interdisciplinary Reviews: Computational Statistics, 4(2):124–167.

Escofier, B. and Pagès, J. (2008).

Analyses Factorielles Simples et Multiples.

Dunod.

Lavit, C., Escoufier, Y., Sabatier, R., and Traissac, P. (1994).

The ACT (STATIS method).

Computational Statistics and Data Analysis, 18(1):97–119.

Li, K. (1991).

Sliced inverse regression for dimension reduction.

Journal of the American Statistical Association, 86(414):316–342.

Références

Documents relatifs

outreg using output.doc, nolabel append 3aster reg growth tourism lyo prim infl sw, robust outreg using output.doc, nolabel append 3aster. 0

❼ Exactly coupled decompositions: additionally to exact coupling of one of the factors, if the data sets are not corrupted by noise, the likelihoods are also products of delta

(c) Each data matrix consisted of a 165 × 15 matrix of val- ues derived from fMRI responses of 10 subjects in response to 165 sounds.. Prior to MCCA the 6309 voxels were reduced to

The aim of group anomaly detection is to detect anomalous sets of points from datasets taking the form of (1), i.e., to detect unusual observations s i from (1). Each anomalous set

It is important to highlight that a Genomic Information System (GeIS) based on a conceptual model allows improving the adaptation of new requirements of the domain, and

Les personnages, contribuent à produire du sens c'est-à-dire à donner et consolider une interprétation du texte, ils se superposent pour donner la dimension réelle

Decision making problem is one of the most important field in multiple cri- teria decision (Triantaphyllou 2000 ). It exists many methods such as: Pareto analysis, decision

High-throughput sequencing techniques are increasingly used and allows to measure a large quantity of data at different levels of the living organism under study. As the cost of