Apprentissage non-supervis ´e

(1)

Apprentissage non-supervis ´e

1

• Typologie de la r ´eduction de dimension

• m ´ethode de base: ACP

• “groupement (clustering) des dimensions”

• extensions:

•ACP non-lin ´eaire(NLPCA)

•´echelonnement multidimensionnel(multidimensional scaling–MDS)

•cartes auto-organisatrices(self-organizing maps–SOM)

•local linear embedding(LLE)

•ISOMAP

•courbes principales(principal curves)

Apprentissage non-supervis ´e

2

• ACP non-lin ´eaire – auto-encodage

• mod `ele de r ´eseau de ACP

x1 x2 xd

x1

x2

xd

Γ(F2)

x F2

F1

1 k

...

sortie

entr´ee lin´eaire

Apprentissage non-supervis ´e

3

• ACP non-lin ´eaire – auto-encodage

• extension non-lin ´eaire

x1 x2 xd

x1

x2

x_d

Γ(F₂)

x

F₂

F1

F₁

1 k

...

sortie

entrée linéaire non-linéaire

non-lin´eaire

Apprentissage non-supervis ´e

4

• ´Echelonnement multidimensionnel (MDS)

• repr ésentation de dimension r éduite qui pr éserve les distances

x1

x2

x3

y1

y2

xi xj

yi

yj

d

ij

δij

espace de source espace de cible

(2)

Apprentissage non-supervis ´e

5

• ´Echelonnement multidimensionnel (MDS)

• fonctions d’erreur

•J_ee=!i<j(di j−"i j)²

!i<j"²_{i j}

•Jf f=

!

i<j

!di j−"i j

"i j

"2

•Je f= 1

!i<j"i j

!

i<j (di j−"i j)²

"i j

Apprentissage non-supervis ´e

6

• ´Echelonnement multidimensionnel (MDS)

• minimisation

•descente de gradientstandard

• initialisation

•lesd^"coordonn´ees avec lesvariances plus grandes

•ACPavecd^"composantes

Apprentissage non-supervis ´e

7

• ´Echelonnement multidimensionnel (MDS)

01 1 5 10 15 20

x1

x2

x3

y1

y2

source target

Apprentissage non-supervis ´e

8

• Cartes auto-organisatrices (SOM)

• x

i

appartient `a V

!

avec un poids W

i,!

• W

i,!

ne d ´epend que de la distance entre v

!

et v

(x_i)

• fonction de fen ˆetre typique

y* y

y

*

y₁ y₂

Λ

(3)

Apprentissage non-supervis ´e

9

• Cartes auto-organisatrices (SOM)

SOM( X

n

)

1 C

⁽⁰⁾

_← ^# v

⁽⁰⁾₁

, . . . ,v

⁽⁰⁾_k

$ 2 j ← 0

3 faire

4 recalculer W

⁽^j)

5 pour ! ← 1 `a k faire

6 v

^(j+1)_!

← 1

n

!

i=1

W

⁽_i,!^j)

x

i

7 j ← j +1

8 jusqu’`a changement > seuil

Apprentissage non-supervis ´e

10

• Cartes auto-organisatrices (SOM)

• 2 dimensions −→ 1 dimension

0 20 100

25,000 50,000 75,000

1000 10,000

100,000 150,000

Apprentissage non-supervis ´e

11

• Cartes auto-organisatrices (SOM)

• 2 dimensions −→ 2 dimensions

100 1000 10,000 25,000 50,000

75,000 100,000 150,000 200,000 300,000

Apprentissage non-supervis ´e

12

• Cartes auto-organisatrices (SOM)

• probl `eme: minimum local

0 1000 25000 400000

(4)

Apprentissage non-supervis ´e

13

• Cartes auto-organisatrices (SOM)

• estimation de densit ´e

0 1000 400,000 800,000

Apprentissage non-supervis ´e

14

• Cartes auto-organis. (SOM) – th ´eorie de communication

• Codage de source – quantification vectorielle:

• fonction d’erreur: J

s

= !

ⁿ

i=1

% x

i

− v

(xi)

%

²

Apprentissage non-supervis ´e

15

• Codage de canal – correction d’erreur:

• probabilit ´e d’erreur d’un bit: p

• distance de Hamming entre des mots de code: d

i,j

= d

H

% c(v

_i

),c(v

_j

) &

• probabilit ´e d’erreur de code: p

i,j

= p

^d^i,j

(1 − p)

^d⁻^d^i,j

• fonction d’erreur: J

c

= !

ⁿ

i=1 c

!

j=1

%v

(xi)

− v

j

%

²

p

x_i,j

• Codage conjoint de canal-source

• fonction d’erreur: J

s+c

= !

ⁿ

i=1 c

!

j=1

%x

i

− v

j

%

²

p

x_i,j

Apprentissage non-supervis ´e

16

• Probl ème g én érale: surfaces compliqu ées → minima local

• Solution 1: ISOMAP

• distance geod ´esique: chemins plus courts dans le graphe de simi- larit ´e

• MDS standard sur les distances geod ´esiques

(5)

17 18

al op

19

optimality; for intrinsically Euclidean man

20

(6)

21 22

Apprentissage non-supervis ´e

• Solution 2: Local linear embedding (LLE)

• Etape 1: trouver l’ensemble des ´ voisins V

x_i

• Etape 2: approximer les points avec une ´ combinaison lin ´eaire de leurs plus proches voisins:

min

W n

!

i=1

' ' ' ' ' x

i

− !

xj∈Vxi

w

i,j

x

j

' ' ' ' '

2

• Etape 3: reconstruire les points dans l’espace de projection en util- ´ isant les m ˆemes poids:

min

Y n

!

i=1

' ' ' ' ' y

i

− !

x_j∈Vxi

w

i,j

y

j

' ' ' ' '

2

gWij!0 ifXjdoes sum to one:"_jWij!1. The optimal weights

The con

23

these recons symmetry: f are invarian translations bors. By sym struction we metric prop opposed to ticular fram invariance t forced by th rows of the

Suppose nonlinear m

##D. To a

exists a lin translation, maps the h each neighb nates on the struction we ric propertie exactly such expect their try in the o

24 try in the valid for particular struct the should al fold coor LLE c mapping b step of th observatio vectorY!_i nates on th d-dimensi embeddin

$

This cost based on but here mizing th cost in Eq vectorsY! the proble by solvin lem (9), w tors prov coordinat Implem straightfo points we est neighb tance or i l

(7)

25 26

Apprentissage non-supervis ´e

• d ´esavantage d’ISOMAP:

• temps d’ex ´ecution: O(n

³

)

• projeter des nouveaux points

• construire la fonction de projection explicitement

• probl `eme d’interpolation

• probl ème d’apprentissage supervis é (r égression multidimensionnelle)

Apprentissage non-supervis ´e

27

• Probl `eme: bruit

Data points Generating curve Polygonal principal curve HS principal curve

Apprentissage non-supervis ´e

28

• Le biais du mod `ele

(0) f ^* (0)

# ^* f

#

(8)

Apprentissage non-supervis ´e

29

• Le biais de l’estimation

f(0) (0) f _$

# _$

# $ $

Apprentissage non-supervis ´e

30

• Solution: courbes principales polygonales

• Mesurer la distance de la courbe au lieu des sommets

S

i

S

i+1

s

i i

v

i-1

V

_i+1

v

S

i-

s

i-2

s

i+1

v

i+1

1

V

i-

2

i i-1

V S

s

i-1

Apprentissage non-supervis ´e

31

• Courbes principales polygonales

Vertex optimization Projection Initialization

Convergence?

% k > c(n, )?

Add new vertex START

END N Y

Y N

Apprentissage non-supervis ´e

32

• Courbes principales polygonales

(a) (b) (c)

(d) (e) (f)

(9)

Apprentissage non-supervis ´e

33

• Courbes principales polygonales

• bruit r ´eduit

Data points Generating curve Polygonal principal curve BR principal curve HS principal curve

Apprentissage non-supervis ´e

34

• Courbes principales polygonales

• beaucoup de points

Apprentissage non-supervis ´e

35

• d ´esavantages des courbes principales:

• minima local

• extension aux surfaces n’est pas ´evident

→ la plupart des applications sont dans le traitement d’image

Apprentissage non-supervis ´e

36

• Skeletisation des caract `eres

(a) Character template Polygonal principal curve

(b) Character template Polygonal principal curve

(c) Character template Polygonal principal curve

(d) Character template Polygonal principal curve

(10)

Apprentissage non-supervis ´e

37

• Skeletisation des caract `eres

(a) Character template Skeleton graph

(b) Character template Skeleton graph

(c) Character template Skeleton graph

(d) Character template Skeleton graph

Apprentissage non-supervis ´e

38

• Apprentissage non-supervis ´e pour la classification: analyse discriminante

• but: trouver la meilleure projection qui pr ´eserve l’information discriminante

• Discriminante de Fisher

• y = w

^t

x

Apprentissage non-supervis ´e

39

• Analyse discriminante

0.5 1 1.5

0.5 1 1.5 2

0.5 1 1.5 x₁

-0.5 0.5 1 1.5 2 x2

w w

x₁ x2

Apprentissage non-supervis ´e

40

• Id ée 1: s éparer les moyennes projet ées

•

i

= 1 n

i

!

x∈D_i

x

• m ˜

i

= 1 n

i

!

y∈Y_i

y = 1 n

i

!

x∈D_i

w

^t

x

• trouver w qui maximise | m ˜

1

− m ˜

2

| = | w

^t

(

₁

−

2

) |

• Id ée 2: s éparer les moyennes projet ées normalis ées par les variances par classe

• s ˜

²i

= !

y∈Y_i

(y − m ˜

i

)

²

• J(w) = ( m ˜

1

− m ˜

2

)

²

˜

s

²₁

+ s ˜

²₂

(11)

Apprentissage non-supervis ´e

41

• Maximiser J(w):

• S

i

= !

x∈Di

(x −

i

)(x −

i

)

^t

• S

W

= S

1

+ S

2

• s ˜

²i

= !

x∈D_i

(w

^t

x −w

^ti

)

²

= !

x∈D_i

w

^t

(x−

i

)(x−

i

)

^t

w = w

^t

S

i

w

• s ˜

²₁

+ s ˜

²₂

= w

^t

S

W

w

• S

B

= (

1

−

2

)(

1

−

2

)

^t

• ( m ˜

1

− m ˜

2

)

²

= (w

^t₁

− w

^t₂

)

²

= w

^t

(

1

−

2

)(

1

−

2

)

^t

w = w

^t

S

B

w

• J(w) = w

^t

S

B

w w

^t

S

W

w

• w

max

= S

_W⁻¹

(

1

−

2

)

Apprentissage non-supervis ´e