• ACP non-lin ´eaire – auto-encodage

(1)

• Typologie de la r ´eduction de dimension

• m ´ethode de base: ACP

• “groupement (clustering) des dimensions”

• extensions:

• ACP non-lin ´eaire (NLPCA)

• ´echelonnement multidimensionnel (multidimensional scaling – MDS)

• cartes auto-organisatrices (self-organizing maps – SOM)

• local linear embedding (LLE)

• ISOMAP

• courbes principales (principal curves)

(2)

• ACP non-lin ´eaire – auto-encodage

• mod `ele de r ´eseau de ACP

x₁ x₂ x_d

x₁

x₂ x_d

Γ(F₂)

x

F₂

F₁

1 k

...

sortie

entr´ee lin´eaire

(3)

• ACP non-lin ´eaire – auto-encodage

• extension non-lin ´eaire

x₁ x₂ x_d

x₁

x₂ x_d

Γ(F₂)

x

F₂

F₁

1 k

...

sortie

entr´ee lin´eaire

non-lin´eaire

(4)

• ´Echelonnement multidimensionnel (MDS)

• repr ésentation de dimension r éduite qui pr éserve les distances

x1

x2

x3

y1

y2

x_i x_j

y_i dij y_j δij

espace de source espace de cible

(5)

• ´Echelonnement multidimensionnel (MDS)

• fonctions d’erreur

• Jee= !i<j(d_{i j}−"i j)²

!i<j"²_{i j}

• Jf f =

!

i<j

!di j−"i j

"i j

"2

• Je f = 1

!i<j"i j

!

i<j

(d_{i j}−"i j)²

"i j

(6)

• ´Echelonnement multidimensionnel (MDS)

• minimisation

• descente de gradient standard

• initialisation

• les d^" coordonn´ees avec les variances plus grandes

• ACPavec d^" composantes

(7)

• ´Echelonnement multidimensionnel (MDS)

0 1 1 5 10 15 20

x1

x2

x3

y1

y2

source target

(8)

• Cartes auto-organisatrices (SOM)

• x

i

appartient `a V

!

avec un poids W

i,!

• W

_i,!

ne d ´epend que de la distance entre

v_!

et

v_(x_i₎

• fonction de fen ˆetre typique

y* y

y

^*

y₁ y₂

Λ

(9)

• Cartes auto-organisatrices (SOM)

SOM( X

_n

)

1

C

⁽⁰⁾

_←

^#

_v

⁽⁰⁾₁ _{, . . . ,}

_v

⁽⁰⁾_k ^$

2

j ←

0 3

faire

4 recalculer

W

⁽^j)

5

pour

!

←

1

`a k faire

6

v

⁽_!^j+1)

←

1

n

n i=1

!

W

⁽_i,!^j)

x

_i

7

j ← j +

1

8

jusqu’`a changement

>

seuil

(10)

• Cartes auto-organisatrices (SOM)

•

2

dimensions −→

1

dimension

0 20 100

25,000 50,000 75,000

1000 10,000

100,000 150,000

(11)

• Cartes auto-organisatrices (SOM)

•

2

dimensions −→

2

dimensions

100 1000 10,000 25,000 50,000

75,000 100,000 150,000 200,000 300,000

(12)

• Cartes auto-organisatrices (SOM)

• probl `eme: minimum local

0 1000 25000 400000

(13)

• Cartes auto-organisatrices (SOM)

• estimation de densit ´e

0 1000 400,000 800,000

(14)

• Cartes auto-organis. (SOM) – th ´eorie de communication

• Codage de source – quantification vectorielle:

• fonction d’erreur: J

_s

= !

ⁿ

i=1

% x

_i

− v

_(x_i₎

%

²

(15)

• Codage de canal – correction d’erreur:

• probabilit ´e d’erreur d’un bit: p

• distance de Hamming entre des mots de code: d

i,j

= d

H

%

c(v

_i

), c(v

_j

)

&

• probabilit ´e d’erreur de code: p

_i,_j

= p

^d^i,^j

(1 − p)

^d⁻^d^i,^j

• fonction d’erreur: J

_c

= !

ⁿ

i=1 c

!

j=1

% v

_(x_i₎

− v

_j

%

²

p

_x_i_,_j

• Codage conjoint de canal-source

• fonction d’erreur: J

s+c

= !

ⁿ

i=1 c

!

j=1

% x

i

− v

j

%

²

p

x_i,j

(16)

• Probl ème g én érale: surfaces compliqu ées → minima local

• Solution 1: ISOMAP

• distance geod ´esique: chemins plus courts dans le graphe de simi- larit ´e

• MDS standard sur les distances geod ´esiques

(17)

(18)

(19)

(20)

(21)

(22)

• Solution 2: Local linear embedding (LLE)

• Etape 1: trouver l’ensemble des ´ voisins

Vx_i

• Etape 2: approximer les points avec une ´ combinaison lin ´eaire de leurs plus proches voisins:

minW n i=1

!

'' ''

'

x

i

− !

x_j∈V_x_i

w

i,j

x

j

'' '' '

2

• Etape 3: reconstruire les points dans l’espace de projection en util- ´ isant les m ˆemes poids:

minY n i=1

!

'' ''

'

y

i

− !

x_j∈V_x_i

w

i,j

y

j

'' '' '

2

(23)

translations bors. By sym struction we metric prop opposed to ticular fram invariance t forced by th rows of the

Suppose nonlinear m

## D. To a exists a lin translation, maps the h each neighb nates on the struction we ric propertie exactly such expect their try in the o

(24)

LLE c mapping b step of th observatio vector Y!_i nates on th d-dimensi embeddin

$

This cost based on but here mizing th cost in Eq vectors Y! the proble by solvin lem (9), w tors prov coordinat Implem straightfo points we est neighb tance or i l

(25)

(26)

• d ´esavantage d’ISOMAP:

• temps d’ex ´ecution: O(n

³

)

• projeter des nouveaux points

• construire la fonction de projection explicitement

• probl `eme d’interpolation

• probl ème d’apprentissage supervis é (r égression multidimensionnelle)

(27)

• Probl `eme: bruit

Data points Generating curve Polygonal principal curve HS principal curve

(28)

• Le biais du mod `ele

(0)

f ^* (0)

# ^* f

#

(29)

• Le biais de l’estimation

f(0) (0) f _$

# _$

# $ $

(30)

• Solution: courbes principales polygonales

• Mesurer la distance de la courbe au lieu des sommets

S

_i

S

_i+₁

s

i

v

_i-1

V

_i+1

v

S

i-

s

_i-2

s

_i+₁

v

_i+1

1

V

i-

2

1 i

S

i-

V

s

_i-1

(31)

• Courbes principales polygonales

Vertex optimization Projection Initialization

Convergence?

k > c(n, )?%

Add new vertex START

END

N Y

Y

N

(32)

• Courbes principales polygonales

(a) (b) (c)

(d) (e) (f)

(33)

• Courbes principales polygonales

• bruit r ´eduit

Data points Generating curve Polygonal principal curve BR principal curve HS principal curve

(34)

• Courbes principales polygonales

• beaucoup de points

(35)

• d ´esavantages des courbes principales:

• minima local

• extension aux surfaces n’est pas ´evident

→ la plupart des applications sont dans le traitement d’image

(36)

• Skeletisation des caract `eres

(a)

Character template Polygonal principal curve

(b)

(c)

(d)

(37)

• Skeletisation des caract `eres

(a)

Character template Skeleton graph

(b)

(c)

(d)

(38)

• Apprentissage non-supervis ´e pour la classification: analyse discriminante

• but: trouver la meilleure projection qui pr ´eserve l’information discriminante

• Discriminante de Fisher

• y = w

^t

x

(39)

• Analyse discriminante

0.5 1 1.5

0.5 1 1.5 2

0.5 1 1.5 x₁

-0.5 0.5 1 1.5 2

x₂

w w

x₁ x₂

(40)

•

ⁱ

=

1

n

_i

!

x∈D_i

x

• m

˜_i

=

1

n

i

!

y∈Y_i

y =

1

n

i

!

x∈D_i

w

^t

x

• trouver w qui maximise | m

˜₁

− m

˜₂

| = | w

^t

(

₁

−

²

) |

• Id ée 2: s éparer les moyennes projet ées normalis ées par les variances par classe

• s

˜²_i

= !

y∈Y_i

(y − m

˜_i

)

²

• J(w) = ( m

˜₁

− m

˜₂

)

²

˜

s

²₁

+ s

˜²₂

(41)

• S

_i

= !

x∈D_i

(x −

ⁱ

)(x −

ⁱ

)

^t

• S

_W

= S

₁

+ S

₂

• s

˜²_i

= !

x∈D_i

(w

^t

x − w

^t_i

)

²

= !

x∈D_i

w

^t

(x −

i

)(x −

i

)

^t

w = w

^t

S

_i

w

• s

˜²₁

+ s

˜²₂

= w

^t

S

_W

w

• S

_B

= (

₁

−

2

)(

₁

−

2

)

^t

• ( m

˜₁

− m

˜₂

)

²

= (w

^t₁

− w

^t₂

)

²

= w

^t

(

₁

−

²

)(

₁

−

²

)

^t

w = w

^t

S

_B

w

• J(w) = w

^t

S

_B

w w

^t

S

W

w

• w

_max

= S

_W⁻¹

(

₁

−

2

)