• Aucun résultat trouvé

Auto-Associative models and generalized Principal Component Analysis

N/A
N/A
Protected

Academic year: 2022

Partager "Auto-Associative models and generalized Principal Component Analysis"

Copied!
29
0
0

Texte intégral

(1)

HAL Id: hal-00985337

https://hal.inria.fr/hal-00985337

Submitted on 29 Apr 2014

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires

To cite this version:

Stéphane Girard, Serge Iovleff. Auto-Associative models and generalized Principal Component Anal-

ysis. Workshop on principal manifolds for data cartography and dimension reduction, Aug 2006,

Leicester, United Kingdom. �hal-00985337�

(2)

St´ephane Girard

⋆ INRIA, Universit´e Grenoble 1

Joint work with Serge Iovleff, Universit´e Lille 1

(3)

Outline

1. Principal Component Analysis, 2 points of view, 2. Generalized PCA, theoretical aspects,

3. Implementation aspects,

4. Illustration on simulated datasets,

5. Illustration on real datasets.

(4)

1. Principal Component Analysis

• Background : Multidimensional data analysis ( n observations in a p − dimensional space)

• Goal : Dimension reduction.

◦ Data visualization (dimension less than 3),

◦ To find which variables are important,

◦ Compression.

• Method: Projection on low d− dimensional linear subspaces.

(5)

• Let X be a centered random vector in R p .

• Estimate the d − dimensional linear subspace d ∈ {0 , . . . , p } minimizing the mean distance to X .

• Minimize with respect to a 1 , . . . , a d (orthonormal):

E

X −

d

X

k=1

X, a k a k

2 

 .

Explicit solution

• a 1 , . . . , a d are the eigenvectors associated to the d largest eigenvalues of E [ X t X ], the covariance matrix of X .

• The a k ’s are called principal axes, the Y k =

X, a k

the principal variables.

• The associated residual is defined by

R d = X −

d

X

k =1

X, a k

a k ,

(6)

PCA: Projection Pursuit interpretation

Equivalent problem

• Estimate the d− dimensional linear subspace d ∈ {0, . . . , p} maximizing the projected variance.

• Maximize iteratively with respect to a 1 , . . . , a d (orthonormal):

Var

X, a 1

, . . . , Var

X, a d

.

(7)

Algorithm

• For j = 0, let R 0 = X .

• For j = 1 , . . . , d :

[A] Estimation of a projection axis.

Determine a j = arg max

x∈ R p E h

x, R j−1 2 i

such that a j

= 1 and

a j , a k

= 0, 1 ≤ k < j . [P] Projection.

Compute the principal variable Y j =

a j , R j−1 . [R] Linear regression.

Determine b j = arg min

x∈ R p E h

R j −1 − Y j x

2 i

such that

b j , a j

= 1 and

b j , a k

= 0, 1 ≤ k < j . The solution is b j = a j , and let the regression function be s j ( t ) = ta j . [U] Residual update.

Compute R j = R j−1 − s j ( Y j ).

(8)

Algorithm output. After d iterations, we have the following expansion:

X =

d

X

k=1

s k ( Y k ) + R d , (1)

with s k ( t ) = ta k and Y k =

a k , X

, or equivalently

X =

d

X

k=1

a k , X

a k + R d .

This equation can be rewritten as

F ( X ) = R d (2)

where we have defined

F ( x ) = x −

d

X

k=1

a k , x a k .

The equation F (x) = 0 defines a d− dimensional linear subspace, spanned by a 1 , . . . , a d .

Equation (2) defines a d− dimensional linear auto-associative model for X .

(9)

Goals of a generalized PCA

1. To keep an expansion similar to (2):

F ( X ) = R d ,

but with a non necessarily linear function F , such that the equation F ( x ) = 0 could model more general subspaces.

2. To keep an expansion “principal variables + residual” similar to (1):

X =

d

X

k=1

s k (Y k ) + R d ,

but with non necessarily linear functions s k .

3. To benefit from the “nice” theoretical properties of PCA.

4. To keep a simple iterative algorithm.

(10)

2. Generalized PCA, theoretical aspects

We adopt the Projection Pursuit point of view. The steps [A] and [R] are generalized:

[A] Estimation of a projection axis.

Introduction of an index I which measures the quality of the projection axis. For instance:

• Dispersion,

• Deviation from normality,

• Clusters detection,

• Outliers detection,...

[R] Regression.

Estimation of the regression function from R to R p in a given set:

• Linear functions,

• Splines, kernels,...

(11)

New algorithm.

• For j = 0, let R 0 = X .

• For j = 1 , . . . , d :

[A] Estimation of a projection axis.

Determine a j = arg max

x ∈ R p I (

x, R j−1

) such that a j

= 1 and

a j , a k

= 0, 1 ≤ k < j . [P] Projection.

Compute the principal variable Y j =

a j , R j−1 . [R] Regression.

Determine s j = arg min

s∈S( R , R p ) E h

R j−1 − s ( Y j )

2 i

such that P a j ◦ s j = Id R and P a k ◦ s j = 0, 1 ≤ k < j .

[U] Residual update

Compute R j = R j −1 − s j ( Y j ).

(12)

Remark: At the end of iteration j , the residual is given by R j = R j−1 − s j Y j

= R j −1 − s j

a j , R j−1

= R j −1 − s j ◦ P a j R j −1

= Id R p − s j ◦ P a j

R j−1

= Id R p − s j ◦ P a j

◦ Id R p − s j −1 ◦ P a j−1

R j −2

= . . .

= Id R p − s j ◦ P a j

◦ . . . ◦ Id R p − s 1 ◦ P a 1

R 0

= Id R p − s j ◦ P a j

◦ . . . ◦ Id R p − s 1 ◦ P a 1

( X ) .

Auto-associative composite model.

(13)

y=P s(y) s

a

a s(y)

• Important consequence: At the end of iteration j , the residual is given by R j = Id R p − s j ◦ P a j

R j−1

, and thus is projection on a j is P a j R j = P a j − P a j ◦ s j ◦ P a j

R j−1

= ( P a j − P a j ) R j−1

= 0 .

Thus, iteration (j + 1) can be performed on the linear subspace orthogonal to (a 1 , . . . , a j ),

(14)

Goal 1. After d iterations:

• One always has an auto-associative model

F ( X ) = R d ,

with

F = Id R p − s d ◦ P a d

◦ . . . ◦ Id R p − s 1 ◦ P a 1

=

1

a

k=d

Id R p − s k ◦ P a k

,

and P a j ( x ) =

a j , x .

• The equation F ( x ) = 0 defines a d − dimensional manifold.

(15)

Goal 2. After d iterations:

• One always as the expansion “principal variables + residual” similar to (1):

X =

d

X

k =1

s k ( Y k ) + R d ,

and the functions s k are non necessarily linear.

• For d = p , the expansion is exact: R p = 0.

• We can still define principal axes a k and principal variables Y k .

• The residuals are centered: E R k

= 0, k = 0 , . . . , d .

(16)

Goal 3. After d iterations, we have:

• Some orthogonality properties

a k , a j

= 0 , 1 ≤ k < j ≤ d, a k , R j

= 0 , 1 ≤ k ≤ j ≤ d, a k , s j ( Y j )

= 0 , 1 ≤ k < j ≤ d.

• Since the norm of the residuals is decreasing, we can define, similarly to the PCA case, the information ratio represented by the d − dimensional model as

Q d = 1 − E h

R d

2 i .

Var h

k X k 2 i .

One can show that Q 0 = 0, Q p = 1 and ( Q d ) is increasing.

Remark. Except in particular cases, the non-correlation of the principal variables is lost:

E

Y k Y j

6= 0 , 1 ≤ k < j ≤ d.

(17)

Goal 4.

• We still have an iterative algorithm. It converges at most in p steps.

• Its complexity depends on the two steps [A] et [R].

[A] Estimation of a projection axis.

Determine a j = arg max

x∈ R p I (

x, R j −1

) such that a j

= 1 and

a j , a k

= 0, 1 ≤ k < j . [R] Regression.

Determine s j = arg min

s∈S( R , R p ) E h

R j−1 − s ( Y j )

2 i

such that P a j ◦ s j = Id R and P a k ◦ s j = 0, 1 ≤ k < j .

• Note that the above theoretical properties do not depend on these steps.

(18)

3. Implementation aspects, step [A]

• Contiguity index. Measure of the neighborhood preservation. Points which are neighbor in R p should stay neighbor on the axis.

I (

x, R j−1 ) =

n

X

i =1

D

x, R i j−1

E 2 , n X

k=1 n

X

ℓ=1

m kℓ D

x, R j k −1 − R j−1 E 2

,

where M = ( m kℓ ) is the contiguity matrix defined by

m kℓ = 1 if R j−1 is the closest neighbor of R k j−1 , m kℓ = 0 otherwise.

• Optimization. Explicit solution.

[A] a j is the eigenvector associated to the largest eigenvalue of V j V j −1 , where

V j =

n

X

k =1

t R j−1 k R j k −1 , V j =

n

X

k =1 n

X

ℓ =1

m kℓ t ( R j k −1 − R j−1 )( R k j−1 − R j−1 )

are proportional to the covariance and local covariance matrices of R j−1 .

(19)

• Set of L 2 functions . The regression step reduces to estimating the conditional expectation:

[R] s j ( Y j ) = E

R j −1 | Y j .

• Estimation of the conditional expectation.

◦ Classical problem since the constraints P a j ◦ s j = Id and P a k ◦ s j = Id, 1 ≤ k < j are easily taken into account in the a k ’s basis. Step [R] reduces to ( p − j ) independent regressions from R to R .

◦ Numerous estimates are available: splines, local polynomials, kernel estimates, ...

◦ For instance, for the coordinate k ∈ { j + 1 , . . . , p }, the kernel estimate of s j ( u ) can be written as

˜

s j k ( u ) =

n

X

i=1

R ˜ j−1 i,k K h ( u − Y i j )

, n X

i=1

K h ( u − Y i j ) ,

where h is a smoothing parameter (the bandwidth).

(20)

4. First illustration on a simulated dataset

• n = 100 points in R 3 randomly chosen on the curve x → (x, sin x, cos x).

• One iteration h = 0 . 3 → Q 1 = 99 . 97%.

1

−0

−1 Z

−1 0 1

Y

9.4

0.0

−9.4

X

Theoretical curve

1

−0

−1 Z

−1 0 1

Y

9.0

−0.2

−9.4

X + +

+

+ +

+ +

+

+ +

+

+

+

+

+

+ +

+ +

+

+ + +

+

+

+

+ +

+ +

+

+ +

+ +

+

+

+ +

+

+

+

+

++

+

+ +

+ +

+ +

+ +

+ + +

++

+ +

+

+ +

+ +

+

+ +

+ +

+

+ +

+

+

+ +

+ +

+ +

+

+

+ +

+

+

+ + +

+ +

+

+

+

+

+ +

+

+ +

+

+ +

+ +

+

+ +

+

+

+

+

+

+ +

+ +

+

+ + +

+

+

+

+ +

+ +

+

+ +

+ +

+

+

+ +

+

+

+

+

++

+

+ +

+ +

+ +

+ +

+ + +

++

+ +

+

+ +

+ +

+

+ +

+ +

+

+ +

+

+

+ +

+ +

+ +

+

+

+ +

+

+

+ + +

+ +

+

+

+

+

+ +

+

Estimated 1− dimensional manifold

(21)

Second illustration on a simulated dataset

• n = 1000 points in R 3 randomly chosen on the surface ( x, y ) → ( x, y, cos( π p

x 2 + y 2 )(1 − exp{−64( x 2 + y 2 )})) .

• Two iterations: Q 1 = 84 . 1% et Q 2 = 97 . 6%.

1.20 0.35

−0.50 Z

−1

0

1 Y

0.5 0.0

−0.5

X

Theoretical surface

1.20 0.35

−0.50 Z

−1

0

1 Y

0.5 0.0

−0.5

X +

+

+ +

+ +

+ +

+

+ +

+ +

+

+

+

+ +

+ + +

+ +

+ +

+ + +

+ +

+

+ +

+ +

+ +

+ +

+

+ + +

+ +

+

+ +

+ + +

+

+ +

+

+ +

+

+ +

+

+ + +

+

+

+

+ +

+ +

+

+

+ +

+ + + +

+ +

+ +

+ +

+ +

+ + +

+

+ +

+

+

+ +

+

+ + +

+ + +

+

+ +

+ +

+

+ +

+ +

+ + +

+

+ +

+

+

+ + +

+ +

+ +

+ +

+ +

+

+ +

+ +

+ +

+ + +

+

+

+ +

+ +

+

+ +

+

+ +

+ +

+ +

+

+

+ + +

++

+ +

+ +

+ +

+

+ +

+

+ +

+ +

+ +

+ +

+

+

+

+ +

+

+ + +

+ +

+

+

+

+ + +

+ +

+ +

++

+ +

+ +

+

+ +

+

+

+ +

+ +

+

+ +

+

+ + +

+ + +

+ +

+

+ +

+

+ +

+ + +

+ +

+ +

+ + +

+ +

+ +

+

+

+

+ + +

+ +

+

+ + +

+ + + +

+ +

+ +

+ + +

+ +

+ +

+

+ +

+ + +

+ +

+

+ +

+

+ +

+

+

+ +

+ + + +

+ + +

+ +

+

+

+ + +

+ +

+ +

+ +

+

+ +

+ + +

+ +

+

+ +

++ +

+

+ +

+

++

+ +

+ +

+

+

+ + +

+ +

+ + +

+ +

+ +

+ +

+ +

+ +

+ +

+ ++

+ +

+ +

+

+ +

+

+ +

+ +

+

+ +

+ +

+ +

+ +

+

+ + +

+ +

+ +

+

+ +

+

+

+ + +

+

+

+ +

+ + +

+ +

+

+ + +

+

+ +

+ +

+ +

+ + +

+

+

+ + +

+ +

+

+

+

+ +

+ +

+

+ +

+

+ +

+ + +

+ +

+

+ +

+ + +

+ + + +

+

+

+ +

+ +

+

+

+ + +

+ + +

+

+ + +

+

+

+

+ +

+ + +

+

+

+ +

+ +

+ +

+

+ +

+ +

+

+

+ ++

+

+

+

+ + + +

+ +

+

+ +

+ +

+ + + +

+

+ +

+

+ + +

+ + +

+

+ +

+ +

+ +

+

+ +

+

+ +

+ +

+

+ + ++

+ + +

+ +

+ + +

+

+ +

+ +

+ +

+

+ +

+

+ +

+ +

+ +

+ +

+

+ +

+ +

+

+ +

+ +

+ +

+

+

+ +

+ +

+ +

+ +

+

+ +

+ +

+ +

+ + +

+ +

+ +

+ +

+ +

+ + +

+ + +

+

+ +

+

+ +

+ +

+

+ + + +

+ + +

+

+

+ +

+ +

+

+ +

+

+

+ +

+ +

+

+ +

+

+

+ +

+ +

+ +

+ +

+ +

+

+ +

++

+ + +

+ +

+

+ +

+ + + + +

+ +

+ +

+ + +

+ +

+

+ +

+ +

+ +

+ +

+

+

+ +

+ +

+

+ + +

+

+ +

+

+

+ +

+

+ +

+ +

+

+ +

+

+ + +

+

+ +

+

+ +

+ + +

+

+ +

+ +

+

+ +

+ + +

+ +

+ +

+ + +

+ +

+ +

+

+

+ +

+ + +

+ +

+

+

+ +

+

+ +

+

+ + + +

+ +

+ +

+

+ +

+ +

+

+

+

+ +

+

+ +

+

+

+ +

+

+ +

+

+ +

+

+

+ +

+

+

+ +

+

+

+ + +

+ + +

+ +

+

+ +

+

+

+

+ +

+ +

+ +

+ + + +

+ +

+ + +

+ + +

+

+ + +

+ +

+ +

+ +

+ +

+

+ +

+

+ +

+

+ +

+

+ + + +

+ + +

+ +

+ +

+ +

+ +

+ + +

+ +

+ + +

+ +

+ +

+ +

+ + + +

+ +

+

+ +

+ +

+ +

+ + +

+

+ +

+

+ +

+ +

+ +

+ +

+

+ +

+

+

+ +

+

+ +

+ +

+

+ +

+ +

+ +

+ +

+ +

+

+

+ +

+

+ +

+

+

+ +

+ + +

+

+ + +

+ +

+

+ ++

+

+

+

+ +

+ +

+ +

+ +

+

+ + +

+

Simulated points

1.20 0.35

−0.50 Z

−1

0

1 Y

0.5 0.0

−0.5

X . . . .. . . .. . .. .

. .. .. .....

......

.. . .. .

. . . . . . .

. . . .. . .. . . .. .

. .......

.....

. .. .. .. . . . . .

. . . .. . .. . . .. .

. ........

.... . .. .. .. . . . . .. . . .. . . .. .

. .. . . ....

.....

... . .. .. .. . . . . .

. . . .. . .. . . .. .. ......

.....

.. . . .. . .. . . . .

. . . .. . .. . . .. .

. ........

.... . .. .. .. . . . . .

. . . .. . .. . . .. .

. .........

.... . .. .

. . . . . . .

. . . .. . .. . . .. .. .....

........

. .. . . . . . . . .

. . . .. . .. . . .. .

. ........

.... . .. .. .. . . . . .

. . . .. . .. . . .. .

. .........

... . .. .. .. . . . . .

. . . .. . .. . . .. .. .....

.......

. .. .. .. . . . . .

. . . .. . .. .. . . .. ........

.... . .. .. .. . . . . .

. . . .. . .. .. . . .. ....

......

.. . .. .. .. . . . . .

. . . .. . .. . . .. .. ......

......

. .. .. .. . . . . .

. . . .. . .. . . .. .

. ........

.... . .. .. .. . . . . .

. . . .. . .. . . .. .. ....

......

.. . .. .. . .. . . . .. . . .. . . .. .. .

. .. ..... ........

. .. . . . . . . . .. . . .. . . .. .

. .. . . .......

......

. .. . . . . . . . .

. . . .. . .. . . .. .

. ........

.... . .. .. . .. . . . .

. . . .. . .. .. . . .. .........

... . .. .. . .. . . . .. . . .. . . .. .. .

. .. .... ......

... . .. .

. . . . . . .

. . . .. . .. .. . . .. ......

......

. .. .. . .. . . . .

. . . .. . .. . . .. .. ......

.......

. .. . . . . . . . .. . . .. . . .. .

. .. . . ......

.......

. .. . . . . . . . .

. . . .. . .. .. . . .. ......

......

. .. . . . .. . . . .

. . . .. . .. . . .. .

. .......

.....

. .. .. .. . . . . .

. . . .. . .. . . .. .

. ........

.... . .. .. .. . . . . .

. . . .. . .. .. . . .. .......

.....

. .. . . . .. . . . .. . . .. . . .. .

. .. . . .......

......

. .. . . . . . . . .. . . .. . . .. .

. .. . . .......

.....

. .. .. .. . . . . .

. . . .. . .. .. . . .. .......

.....

. .. .. .. . . . . .

. . . .. . .. .. . . .. ......

.......

. .. . . . . . . . .

. . . .. . .. .. . . .. .....

.....

... . .. .

. . . . . . .

. . . .. . .. . . .. .. ....

.......

.. . . .. . .. . . . .

. . . .. . .. . . .. .

. .........

.... . .. .

. . . . . . .

. . . .. . .. .. . . .. ........

.... . .. .

. . .. . . . .. . . .. . . .. .

. .. . . .......

......

. .. . . . . . . . .. . . .. . . .. .. .

. .. .....

.....

... . .. .

. . . . . . .. . . .. . . .. .. .

. .. .........

... . .. .. .. . . . . .

. . . .. . .. . . .. .

. .......

.....

. .. . . . .. . . . .. . . .. . . .. .

. .. .. ......

.......

. .. . . . . . . . .

. . . .. . .. . . .. .

. .... .....

... . .. .. .. . . . . .. . . .. . . .. .

. .. . . ........

.... . .. .. .. . . . . .

. . . .. . .. . . .. .

. .......

......

. .. .. . . . . . .

. . . .. . .. .. . . .. .....

......

.. . . .. . .. . . . .

. . . .. . .. . . .. .

. .........

.... . .. .. . . . . . .

. . . .. . .. . . .. .

. ........

.... . .. .. .. . . . . .

. . . .. . .. . . .. .

. ...... ......

. .. .. .. . . . . .

. . . .. . .. . . .. .. ......

.......

. .. . . . . . . . .

. . . .. . .. .. . . .. ....

......

... . .. .

. . .

Estimated 2− dimensional manifold

Références

Documents relatifs

Nonlinear principal component analysis (NLPCA) is applied on detrended monthly Sea Surface Temperature Anomaly (SSTA) data from the tropical Atlantic Ocean (30°W-20°E, 26°S-22°..

We show how slight variants of the regularization term of [1] can be used successfully to yield a structured and sparse formulation of principal component analysis for which we

In Section II we define the generalized Hopfield model, the Bayesian in- ference framework and list our main results, that is, the expressions of the patterns without and

The first column vectors (POMs) of U, corresponding to the first non-zero elements of S, can represent the dynamic characteristics of the system. Each POM is a combination of

The following three steps process has been implemented: collecting PM2.5 (particles that are 2.5 microns or less in diameter, also called fine particles) with sequential fine

Its uses the concept of neighborhood graphs and compares a set of proximity measures for continuous data which can be more-or-less equivalent a

We argued that if human spatial memory of multifloored environments (here a cylindrical one) is preferentially exploited by floors regardless of the learning mode, then

the evaluation of Equations 2.3, 2.4, and 2.5 with the Jacobian matrix being calculated with only the major species alone does not guarantee that the most meaningful or