• Aucun résultat trouvé

Multiple Kernel Self-Organizing Maps

N/A
N/A
Protected

Academic year: 2022

Partager "Multiple Kernel Self-Organizing Maps"

Copied!
31
0
0

Texte intégral

(1)

Multiple Kernel Self-Organizing Maps

Madalina Olteanu(2), Nathalie Villa-Vialaneix(1,2), Christine Cierco-Ayrolles(1)

http://www.nathalievilla.org

{madalina.olteanu,nathalie.villa}@univ-paris1.fr, christine.cierco@toulouse.inra.fr

2013, April 24th

(1) (2)

MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, 2013 1 / 18

(2)

Outline

1 Introduction

2 MK-SOM

3 Applications

MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, 2013 2 / 18

(3)

Introduction

Data, notations and objectives

Data: Ddatasets(xid)i=1,...,n,d=1,...,D all measured on the same individuals or on the same objects,{i, . . . ,n}, each taking values in an arbitrary space Gd.

Examples:

• (xid)iarenobservations ofpnumeric variables;

• (xid)iarennodes of a graph;

• (xid)iarenobservations ofpfactors;

• (xid)iarentexts/labels...

Purpose: Combine all datasets to obtain a map of individuals/objects: self-organizing maps for clustering{i, . . . ,n}using all datasets.

MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, 2013 3 / 18

(4)

Introduction

Data, notations and objectives

Data: Ddatasets(xid)i=1,...,n,d=1,...,D all measured on the same individuals or on the same objects,{i, . . . ,n}, each taking values in an arbitrary space Gd.

Examples:

• (xid)iarenobservations ofpnumeric variables;

• (xid)iarennodes of a graph;

• (xid)iarenobservations ofpfactors;

• (xid)iarentexts/labels...

self-organizing maps for clustering{i, . . . ,n}using all datasets.

MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, 2013 3 / 18

(5)

Introduction

Data, notations and objectives

Data: Ddatasets(xid)i=1,...,n,d=1,...,D all measured on the same individuals or on the same objects,{i, . . . ,n}, each taking values in an arbitrary space Gd.

Examples:

• (xid)iarenobservations ofpnumeric variables;

• (xid)iarennodes of a graph;

• (xid)iarenobservations ofpfactors;

• (xid)iarentexts/labels...

Purpose: Combine all datasets to obtain a map of individuals/objects:

self-organizing maps for clustering{i, . . . ,n}using all datasets.

MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, 2013 3 / 18

(6)

Example [Villa-Vialaneix et al., 2013]

Data: Anetworkwith labeled nodes.

Examples: Gender in a social network, Functional group of a gene in a gene interaction network...

Purpose: find communities, i.e., groups of “close” nodes in the graph;

“close” meaning:

densely connectedand sharing (comparatively) a few links with the other groups (“communities”);

• but alsohaving similar labels.

MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, 2013 4 / 18

(7)

MK-SOM

Outline

1 Introduction

2 MK-SOM

3 Applications

MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, 2013 5 / 18

(8)

Kernels/Multiple kernel

What is a kernel?

(xi)∈ G,(K(xi,xj))ij st: K(xi,xj) =K(xj,xi)and

∀(αi)i, P

ijαiαjK(xi,xj)≥0. In this case[Aronszajn, 1950],

∃(H,h., .i),Φ :G → Hst: K(xi,xj) =hΦ(xi),Φ(xj)i

Examples:

nodes of a graph: Heat kernelK =e−βL orK =L+whereL is the Laplacian[Kondor and Lafferty, 2002, Smola and Kondor, 2003, Fouss et al., 2007]

numerical variables: Gaussian kernelK(xi,xj) =e−βkxi−xjk2;

text: [Watkins, 2000]...

MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, 2013 6 / 18

(9)

MK-SOM

Kernel SOM

[Mac Donald and Fyfe, 2000, Andras, 2002, Villa and Rossi, 2007]

(xi)i⊂ Gdescribed by a kernel(K(xi,xi0))ii0. Prototypes are defined inH: pj=P

iγjiφ(xi); energy is calculated inH:

1: Prototypes initialization: randomly setpj0 = Pn

i=1γ0ijΦ(xi) stγij0 ≥ 0 andP

iγij0=1

2: fort =1→T do

3: Randomly choosei ∈ {1, . . . ,n}

4: Assignment

ft(xi)←arg min

j=1,...,Mkxi−pjt−1kH

5: for allj=1→MdoRepresentation

6: γtj ←γjt−1tht(δ(ft(xi),j))

1i−γjt−1

7: end for

8: end for

δ: distance between neurons,h: decreasing function ((ht)t diminishes witht),µt 1/t.

MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, 2013 7 / 18

(10)

Kernel SOM

[Mac Donald and Fyfe, 2000, Andras, 2002, Villa and Rossi, 2007]

(xi)i⊂ Gdescribed by a kernel(K(xi,xi0))ii0. Prototypes are defined inH: pj=P

iγjiφ(xi); energy is calculated inH:

1: Prototypes initialization: randomly setpj0 = Pn

i=1γ0ijΦ(xi) stγij0 ≥ 0 andP

iγij0=1

2: fort =1→T do

3: Randomly choosei ∈ {1, . . . ,n}

4: Assignment ft(xi)←arg min

j=1,...,M

X

ll0

γt−1jl γtjl−10 Kt(xl,xl0)−2X

l

γt−1jl Kt(xl,xi)

5: for allj=1→MdoRepresentation

6: γtj ←γjt−1tht(δ(ft(xi),j))

1i−γjt−1

7: end for

8: end for

δ: distance between neurons,h: decreasing function ((ht)t diminishes witht),µt 1/t.

MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, 2013 7 / 18

(11)

MK-SOM

Combining kernels

Suppose: each(xid)iis described by a kernelKd, kernels can be combined (e.g.,[Rakotomamonjy et al., 2008]for SVM):

K = XD

d=1

αdKd,

withαd ≥0 andP

dαd =1.

Remark: Also useful to integrate different types of information coming from different kernels on the same dataset.

MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, 2013 8 / 18

(12)

Combining kernels

Suppose: each(xid)iis described by a kernelKd, kernels can be combined (e.g.,[Rakotomamonjy et al., 2008]for SVM):

K = XD

d=1

αdKd,

withαd ≥0 andP

dαd =1.

Remark: Also useful to integrate different types of information coming from different kernels on the same dataset.

MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, 2013 8 / 18

(13)

MK-SOM

multiple kernel SOM

1: Prototypes initialization: randomly setpj0=Pn

i=1γji0Φ(xi)stγ0ji0 and P

iγ0ji=1

2: Kernel initialization: set0d) stα0d 0 andP

dαd = 1 (e.g., α0d = 1/D); K0 ←P

dα0dKd

3: fort =1→T do

4: Randomly choosei ∈ {1, . . . ,n}

5: Assignment

ft(xi)←arg min

j=1,...,Mkxi−ptj−1k2H(Kt)

6: for allj=1→MdoRepresentation

7: γtj ←γjt−1tht(δ(ft(xi),j))

1i−γjt−1

8: end for

9: empty state ;-)

10: end for

MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, 2013 9 / 18

(14)

multiple kernel SOM

1: Prototypes initialization: randomly setpj0=Pn

i=1γji0Φ(xi)stγ0ji0 and P

iγ0ji=1

2: Kernel initialization: set0d) stα0d 0 andP

dαd = 1 (e.g., α0d = 1/D); K0 ←P

dα0dKd

3: fort =1→T do

4: Randomly choosei ∈ {1, . . . ,n}

5: Assignment ft(xi)←arg min

j=1,...,M

X

ll0

γtjl1γtjl01Kt(xl,xl0)−2X

l

γtjl1Kt(xl,xi)

6: for allj=1→MdoRepresentation

7: γtj ←γjt−1tht(δ(ft(xi),j))

1i−γjt−1

8: end for

9: empty state ;-)

10: end for

MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, 2013 9 / 18

(15)

MK-SOM

Tuning (α

d

)

d

on-line

Purpose: minimize overji)ji and(αd)d the energy E((γji)ji,(αd)d) =

Xn

i=1

XM

j=1

h(δ(f(xi),j))

φα(xi)−pjαj)

2 α ,

KSOMpicks up an observationxithat has a contribution to the energy equal to:

E|xi =

M

X

j=1

h(δ(f(xi),j))

φα(xi)−pjαj)

2

α (1)

Idea: Add a gradient descent step based on the derivative of (1):

∂E|xi

∂αd = XM

j=1

h(δ(f(xi),j))





Kd(xid,xid)−2 Xn

l=1

γjlKd(xid,xld)+

n

X

l,l0=1

γjlγjl0Kd(xld,xld0)







=

M

X

j=1

h(δ(f(xi),j))kφ(xi)−pjdj)k2H(K

d)

MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, 2013 10 / 18

(16)

MK-SOM

Tuning (α

d

)

d

on-line

Purpose: minimize overji)ji and(αd)d the energy E((γji)ji,(αd)d) =

Xn

i=1

XM

j=1

h(δ(f(xi),j))

φα(xi)−pjαj)

2 α ,

KSOMpicks up an observationxi that has a contribution to the energy equal to:

E|xi =

M

X

j=1

h(δ(f(xi),j))

φα(xi)−pjαj)

2

α (1)

∂E|xi

∂αd = XM

j=1

h(δ(f(xi),j))





Kd(xid,xid)−2 Xn

l=1

γjlKd(xid,xld)+

n

X

l,l0=1

γjlγjl0Kd(xld,xld0)







=

M

X

j=1

h(δ(f(xi),j))kφ(xi)−pjdj)k2H(K

d)

MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, 2013 10 / 18

(17)

MK-SOM

Tuning (α

d

)

d

on-line

Purpose: minimize overji)ji and(αd)d the energy E((γji)ji,(αd)d) =

Xn

i=1

XM

j=1

h(δ(f(xi),j))

φα(xi)−pjαj)

2 α ,

KSOMpicks up an observationxi that has a contribution to the energy equal to:

E|xi =

M

X

j=1

h(δ(f(xi),j))

φα(xi)−pjαj)

2

α (1)

Idea: Add a gradient descent step based on the derivative of (1):

∂E|xi

∂αd = XM

j=1

h(δ(f(xi),j))





Kd(xid,xid)−2 Xn

l=1

γjlKd(xid,xld)+

Xn

l,l0=1

γjlγjl0Kd(xld,xld0)







= XM

j=1

h(δ(f(xi),j))kφ(xi)−pjdj)k2H(K

d)

MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, 2013 10 / 18

(18)

MK-SOM

adaptive multiple kernel SOM

1: Prototypes initialization: randomly setpj0=Pn

i=1γji0Φ(xi)stγ0ji0 and P

iγ0ji=1

2: Kernel initialization: set0d) stα0d 0 andP

dαd = 1 (e.g., α0d = 1/D); K0 ←P

dα0dKd

3: fort =1→T do

4: Randomly choosei ∈ {1, . . . ,n}

5: Assignment

ft(xi)←arg min

j=1,...,Mkxi−ptj−1k2H(Kt)

6: for allj=1→MdoRepresentation

7: γtj ←γjt−1tht(δ(ft(xi),j))

1i−γjt−1

8: end for

9:

Kernel updateαtd ←αt−1dt ∂αxi

d andKtdαtdKd

10: end for

MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, 2013 11 / 18

(19)

MK-SOM

adaptive multiple kernel SOM

1: Prototypes initialization: randomly setpj0=Pn

i=1γji0Φ(xi)stγ0ji0 and P

iγ0ji=1

2: Kernel initialization: set0d) stα0d 0 andP

dαd = 1 (e.g., α0d = 1/D); K0 ←P

dα0dKd

3: fort =1→T do

4: Randomly choosei ∈ {1, . . . ,n}

5: Assignment

ft(xi)←arg min

j=1,...,Mkxi−ptj−1k2H(Kt)

6: for allj=1→MdoRepresentation

7: γtj ←γjt−1tht(δ(ft(xi),j))

1i−γjt−1

8: end for

9: Kernel updateαtd ←αt−1dt∂E|∂αxi

d andKt ←P

dαtdKd

10: end for

MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, 2013 11 / 18

(20)

Outline

1 Introduction

2 MK-SOM

3 Applications

MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, 2013 12 / 18

(21)

Applications

Example1: simulated data

Graph with 200 nodes classified in 8 groups:

graph: Erdös Reyni models: groups 1 to 4 and groups 5 to 8 with intra-group edge probability 0.3 and inter-group edge probability 0.01;

numerical data: nodes labelled with 2-dimensional Gaussian vectors:

odd groupsN 0

0

!

, 0.3 0 0 0.3

!!

and even groupsN 1

1

!

, 0.3 0 0 0.3

!!

;

factorwith two levels: groups 1, 2, 5 and 7: first level; other groups:

second level.

Only the knownledge on the three datasets can discriminate all 8 groups.

MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, 2013 13 / 18

(22)

Applications

Experiment

Kernels: graph: L+, numerical data: Gaussian kernel; factor: another Gaussian kernel on the disjunctive recoding

described:

• multiple kernel SOM with all three data;

• kernel SOM with a single dataset;

• kernel SOM with two datasets or all three datasets in a single (Gaussian) kernel.

MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, 2013 14 / 18

(23)

Applications

Experiment

Kernels: graph: L+, numerical data: Gaussian kernel; factor: another Gaussian kernel on the disjunctive recoding

Comparison: on 100 randomly generated datasets as previously described:

• multiple kernel SOM with all three data;

• kernel SOM with a single dataset;

• kernel SOM with two datasets or all three datasets in a single (Gaussian) kernel.

MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, 2013 14 / 18

(24)

An example

MK-SOM All in one kernel

Graph only Numerical variables only

MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, 2013 15 / 18

(25)

Applications

Numerical comparison I

(over 100 simulations) withmutual information

X

ij

|Ci∩C˜j|

200 log |Ci∩C˜j|

|Ci| × |C˜j|

(adjusted version, equal to 1 if partitions are identical[Danon et al., 2005]).

MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, 2013 16 / 18

(26)

Numerical comparison II

(over 100 simulations) withnode purity(average percentage of nodes in the neuron that come from the majority (original) class of the neuron).

MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, 2013 17 / 18

(27)

Applications

Conclusion

Summary

• integrating multiple sources information on SOM (e.g., finding communities in graph, while taking into account labels);

• uses multiple kernel and automatically tunes the combination;

• the method gives relevant communities according to all sources of information and a well-organized map;

• a similar approach also tested for relational SOM

([Massoni et al., 2013]for analyzing school-to-work transitions).

Possible developments

Main issue: computational time; currently studying a sparse version. Thank you for your attention...

MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, 2013 18 / 18

(28)

Applications

Conclusion

Summary

• integrating multiple sources information on SOM (e.g., finding communities in graph, while taking into account labels);

• uses multiple kernel and automatically tunes the combination;

• the method gives relevant communities according to all sources of information and a well-organized map;

• a similar approach also tested for relational SOM

([Massoni et al., 2013]for analyzing school-to-work transitions).

Possible developments

Main issue: computational time; currently studying a sparse version.

MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, 2013 18 / 18

(29)

Applications

Conclusion

Summary

• integrating multiple sources information on SOM (e.g., finding communities in graph, while taking into account labels);

• uses multiple kernel and automatically tunes the combination;

• the method gives relevant communities according to all sources of information and a well-organized map;

• a similar approach also tested for relational SOM

([Massoni et al., 2013]for analyzing school-to-work transitions).

Possible developments

Main issue: computational time; currently studying a sparse version.

Thank you for your attention...

MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, 2013 18 / 18

(30)

References

Andras, P. (2002).

Kernel-Kohonen networks.

International Journal of Neural Systems, 12:117–135.

Aronszajn, N. (1950).

Theory of reproducing kernels.

Transactions of the American Mathematical Society, 68(3):337–404.

Danon, L., Diaz-Guilera, A., Duch, J., and Arenas, A. (2005).

Comparing community structure identification.

Journal of Statistical Mechanics, page P09008.

Fouss, F., Pirotte, A., Renders, J., and Saerens, M. (2007).

Random-walk computation of similarities between nodes of a graph, with application to collaborative recommendation.

IEEE Transactions on Knowledge and Data Engineering, 19(3):355–369.

Kondor, R. and Lafferty, J. (2002).

Diffusion kernels on graphs and other discrete structures.

InProceedings of the 19th International Conference on Machine Learning, pages 315–322.

Mac Donald, D. and Fyfe, C. (2000).

The kernel self organising map.

InProceedings of 4th International Conference on knowledge-based intelligence engineering systems and applied technologies, pages 317–320.

Massoni, S., Olteanu, M., and Villa-Vialaneix, N. (2013).

Which distance use when extracting typologies in sequence analysis? An application to school to work transitions.

InInternational Work Conference on Artificial Neural Networks (IWANN 2013), Puerto de la Cruz, Tenerife.

Forthcoming.

Rakotomamonjy, A., Bach, F., Canu, S., and Grandvalet, Y. (2008).

SimpleMKL.

MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, 2013 18 / 18

(31)

Applications Journal of Machine Learning Research, 9:2491–2521.

Smola, A. and Kondor, R. (2003).

Kernels and regularization on graphs.

In Warmuth, M. and Schölkopf, B., editors,Proceedings of the Conference on Learning Theory (COLT) and Kernel Workshop, Lecture Notes in Computer Science, pages 144–158.

Villa, N. and Rossi, F. (2007).

A comparison between dissimilarity SOM and kernel SOM for clustering the vertices of a graph.

In6th International Workshop on Self-Organizing Maps (WSOM), Bielefield, Germany. Neuroinformatics Group, Bielefield University.

Villa-Vialaneix, N., Olteanu, M., and Cierco-Ayrolles, C. (2013).

Carte auto-organisatrice pour graphes étiquetés.

InActes des Ateliers FGG (Fouille de Grands Graphes), colloque EGC (Extraction et Gestion de Connaissances), Toulouse, France.

Watkins, C. (2000).

Dynamic alignment kernels.

In Smola, A., Bartlett, P., Schölkopf, B., and Schuurmans, D., editors,Advances in Large Margin Classifiers, pages 39–50, Cambridge, MA, USA. MIT P.

MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, 2013 18 / 18

Références

Documents relatifs

The contributions of the present manuscript to the analysis of multi- omics datasets are twofolds: firstly, we have proposed three unsupervised kernel learning approaches to

Previous studies on Multiple Kernel Learning have focused on a few themes, the use of a genetic algorithm for hyperparameters selection [8], genetic algorithms for ker- nel

La seconde tendance correspond à l’utilisation croissance des STIC Sciences et Technologies de l'Information et de la Communication dans la gestion de la mobilité : les dispositifs

• we propose a block-coordinate descent algorithm to solve the MovKL opti- mization problem which involves solving a challenging linear system with a sum of block operator matrices,

The goal of this paper is to present the association of a classical graph kernel [3], combined with a new graph kernel which takes into account chirality [1, 2], in order to predict

The loss of quality at the outputs of each kernel after global optimization indicates the amount of approximation.. that should be applied for each kernel to obtain the minimum cost.

In this report we have proposed an extension of this previous methods which consists in construct a new graph, where each nodes represent a stereo subgraph and each edge encode

Using tools from functional analysis, and in particular covariance operators, we extend the consistency results to this infinite dimensional case and also propose an adaptive scheme