Multiple Kernel Self-Organizing Maps
Madalina Olteanu(2), Nathalie Villa-Vialaneix(1,2), Christine Cierco-Ayrolles(1)
http://www.nathalievilla.org
{madalina.olteanu,nathalie.villa}@univ-paris1.fr, christine.cierco@toulouse.inra.fr
2013, April 24th
(1) (2)
MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, 2013 1 / 18
Outline
1 Introduction
2 MK-SOM
3 Applications
MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, 2013 2 / 18
Introduction
Data, notations and objectives
Data: Ddatasets(xid)i=1,...,n,d=1,...,D all measured on the same individuals or on the same objects,{i, . . . ,n}, each taking values in an arbitrary space Gd.
Examples:
• (xid)iarenobservations ofpnumeric variables;
• (xid)iarennodes of a graph;
• (xid)iarenobservations ofpfactors;
• (xid)iarentexts/labels...
Purpose: Combine all datasets to obtain a map of individuals/objects: self-organizing maps for clustering{i, . . . ,n}using all datasets.
MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, 2013 3 / 18
Introduction
Data, notations and objectives
Data: Ddatasets(xid)i=1,...,n,d=1,...,D all measured on the same individuals or on the same objects,{i, . . . ,n}, each taking values in an arbitrary space Gd.
Examples:
• (xid)iarenobservations ofpnumeric variables;
• (xid)iarennodes of a graph;
• (xid)iarenobservations ofpfactors;
• (xid)iarentexts/labels...
self-organizing maps for clustering{i, . . . ,n}using all datasets.
MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, 2013 3 / 18
Introduction
Data, notations and objectives
Data: Ddatasets(xid)i=1,...,n,d=1,...,D all measured on the same individuals or on the same objects,{i, . . . ,n}, each taking values in an arbitrary space Gd.
Examples:
• (xid)iarenobservations ofpnumeric variables;
• (xid)iarennodes of a graph;
• (xid)iarenobservations ofpfactors;
• (xid)iarentexts/labels...
Purpose: Combine all datasets to obtain a map of individuals/objects:
self-organizing maps for clustering{i, . . . ,n}using all datasets.
MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, 2013 3 / 18
Example [Villa-Vialaneix et al., 2013]
Data: Anetworkwith labeled nodes.
Examples: Gender in a social network, Functional group of a gene in a gene interaction network...
Purpose: find communities, i.e., groups of “close” nodes in the graph;
“close” meaning:
• densely connectedand sharing (comparatively) a few links with the other groups (“communities”);
• but alsohaving similar labels.
MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, 2013 4 / 18
MK-SOM
Outline
1 Introduction
2 MK-SOM
3 Applications
MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, 2013 5 / 18
Kernels/Multiple kernel
What is a kernel?
(xi)∈ G,(K(xi,xj))ij st: K(xi,xj) =K(xj,xi)and
∀(αi)i, P
ijαiαjK(xi,xj)≥0. In this case[Aronszajn, 1950],
∃(H,h., .i),Φ :G → Hst: K(xi,xj) =hΦ(xi),Φ(xj)i
Examples:
• nodes of a graph: Heat kernelK =e−βL orK =L+whereL is the Laplacian[Kondor and Lafferty, 2002, Smola and Kondor, 2003, Fouss et al., 2007]
• numerical variables: Gaussian kernelK(xi,xj) =e−βkxi−xjk2;
• text: [Watkins, 2000]...
MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, 2013 6 / 18
MK-SOM
Kernel SOM
[Mac Donald and Fyfe, 2000, Andras, 2002, Villa and Rossi, 2007]
(xi)i⊂ Gdescribed by a kernel(K(xi,xi0))ii0. Prototypes are defined inH: pj=P
iγjiφ(xi); energy is calculated inH:
1: Prototypes initialization: randomly setpj0 = Pn
i=1γ0ijΦ(xi) stγij0 ≥ 0 andP
iγij0=1
2: fort =1→T do
3: Randomly choosei ∈ {1, . . . ,n}
4: Assignment
ft(xi)←arg min
j=1,...,Mkxi−pjt−1kH
5: for allj=1→MdoRepresentation
6: γtj ←γjt−1+µtht(δ(ft(xi),j))
1i−γjt−1
7: end for
8: end for
δ: distance between neurons,h: decreasing function ((ht)t diminishes witht),µt ∼1/t.
MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, 2013 7 / 18
Kernel SOM
[Mac Donald and Fyfe, 2000, Andras, 2002, Villa and Rossi, 2007]
(xi)i⊂ Gdescribed by a kernel(K(xi,xi0))ii0. Prototypes are defined inH: pj=P
iγjiφ(xi); energy is calculated inH:
1: Prototypes initialization: randomly setpj0 = Pn
i=1γ0ijΦ(xi) stγij0 ≥ 0 andP
iγij0=1
2: fort =1→T do
3: Randomly choosei ∈ {1, . . . ,n}
4: Assignment ft(xi)←arg min
j=1,...,M
X
ll0
γt−1jl γtjl−10 Kt(xl,xl0)−2X
l
γt−1jl Kt(xl,xi)
5: for allj=1→MdoRepresentation
6: γtj ←γjt−1+µtht(δ(ft(xi),j))
1i−γjt−1
7: end for
8: end for
δ: distance between neurons,h: decreasing function ((ht)t diminishes witht),µt ∼1/t.
MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, 2013 7 / 18
MK-SOM
Combining kernels
Suppose: each(xid)iis described by a kernelKd, kernels can be combined (e.g.,[Rakotomamonjy et al., 2008]for SVM):
K = XD
d=1
αdKd,
withαd ≥0 andP
dαd =1.
Remark: Also useful to integrate different types of information coming from different kernels on the same dataset.
MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, 2013 8 / 18
Combining kernels
Suppose: each(xid)iis described by a kernelKd, kernels can be combined (e.g.,[Rakotomamonjy et al., 2008]for SVM):
K = XD
d=1
αdKd,
withαd ≥0 andP
dαd =1.
Remark: Also useful to integrate different types of information coming from different kernels on the same dataset.
MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, 2013 8 / 18
MK-SOM
multiple kernel SOM
1: Prototypes initialization: randomly setpj0=Pn
i=1γji0Φ(xi)stγ0ji≥0 and P
iγ0ji=1
2: Kernel initialization: set (α0d) stα0d ≥ 0 andP
dαd = 1 (e.g., α0d = 1/D); K0 ←P
dα0dKd
3: fort =1→T do
4: Randomly choosei ∈ {1, . . . ,n}
5: Assignment
ft(xi)←arg min
j=1,...,Mkxi−ptj−1k2H(Kt)
6: for allj=1→MdoRepresentation
7: γtj ←γjt−1+µtht(δ(ft(xi),j))
1i−γjt−1
8: end for
9: empty state ;-)
10: end for
MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, 2013 9 / 18
multiple kernel SOM
1: Prototypes initialization: randomly setpj0=Pn
i=1γji0Φ(xi)stγ0ji≥0 and P
iγ0ji=1
2: Kernel initialization: set (α0d) stα0d ≥ 0 andP
dαd = 1 (e.g., α0d = 1/D); K0 ←P
dα0dKd
3: fort =1→T do
4: Randomly choosei ∈ {1, . . . ,n}
5: Assignment ft(xi)←arg min
j=1,...,M
X
ll0
γtjl−1γtjl−01Kt(xl,xl0)−2X
l
γtjl−1Kt(xl,xi)
6: for allj=1→MdoRepresentation
7: γtj ←γjt−1+µtht(δ(ft(xi),j))
1i−γjt−1
8: end for
9: empty state ;-)
10: end for
MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, 2013 9 / 18
MK-SOM
Tuning (α
d)
don-line
Purpose: minimize over(γji)ji and(αd)d the energy E((γji)ji,(αd)d) =
Xn
i=1
XM
j=1
h(δ(f(xi),j))
φα(xi)−pjα(γj)
2 α ,
KSOMpicks up an observationxithat has a contribution to the energy equal to:
E|xi =
M
X
j=1
h(δ(f(xi),j))
φα(xi)−pjα(γj)
2
α (1)
Idea: Add a gradient descent step based on the derivative of (1):
∂E|xi
∂αd = XM
j=1
h(δ(f(xi),j))
Kd(xid,xid)−2 Xn
l=1
γjlKd(xid,xld)+
n
X
l,l0=1
γjlγjl0Kd(xld,xld0)
=
M
X
j=1
h(δ(f(xi),j))kφ(xi)−pjd(γj)k2H(K
d)
MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, 2013 10 / 18
MK-SOM
Tuning (α
d)
don-line
Purpose: minimize over(γji)ji and(αd)d the energy E((γji)ji,(αd)d) =
Xn
i=1
XM
j=1
h(δ(f(xi),j))
φα(xi)−pjα(γj)
2 α ,
KSOMpicks up an observationxi that has a contribution to the energy equal to:
E|xi =
M
X
j=1
h(δ(f(xi),j))
φα(xi)−pjα(γj)
2
α (1)
∂E|xi
∂αd = XM
j=1
h(δ(f(xi),j))
Kd(xid,xid)−2 Xn
l=1
γjlKd(xid,xld)+
n
X
l,l0=1
γjlγjl0Kd(xld,xld0)
=
M
X
j=1
h(δ(f(xi),j))kφ(xi)−pjd(γj)k2H(K
d)
MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, 2013 10 / 18
MK-SOM
Tuning (α
d)
don-line
Purpose: minimize over(γji)ji and(αd)d the energy E((γji)ji,(αd)d) =
Xn
i=1
XM
j=1
h(δ(f(xi),j))
φα(xi)−pjα(γj)
2 α ,
KSOMpicks up an observationxi that has a contribution to the energy equal to:
E|xi =
M
X
j=1
h(δ(f(xi),j))
φα(xi)−pjα(γj)
2
α (1)
Idea: Add a gradient descent step based on the derivative of (1):
∂E|xi
∂αd = XM
j=1
h(δ(f(xi),j))
Kd(xid,xid)−2 Xn
l=1
γjlKd(xid,xld)+
Xn
l,l0=1
γjlγjl0Kd(xld,xld0)
= XM
j=1
h(δ(f(xi),j))kφ(xi)−pjd(γj)k2H(K
d)
MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, 2013 10 / 18
MK-SOM
adaptive multiple kernel SOM
1: Prototypes initialization: randomly setpj0=Pn
i=1γji0Φ(xi)stγ0ji≥0 and P
iγ0ji=1
2: Kernel initialization: set (α0d) stα0d ≥ 0 andP
dαd = 1 (e.g., α0d = 1/D); K0 ←P
dα0dKd
3: fort =1→T do
4: Randomly choosei ∈ {1, . . . ,n}
5: Assignment
ft(xi)←arg min
j=1,...,Mkxi−ptj−1k2H(Kt)
6: for allj=1→MdoRepresentation
7: γtj ←γjt−1+µtht(δ(ft(xi),j))
1i−γjt−1
8: end for
9:
Kernel updateαtd ←αt−1d +νt ∂αxi
d andKt ← dαtdKd
10: end for
MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, 2013 11 / 18
MK-SOM
adaptive multiple kernel SOM
1: Prototypes initialization: randomly setpj0=Pn
i=1γji0Φ(xi)stγ0ji≥0 and P
iγ0ji=1
2: Kernel initialization: set (α0d) stα0d ≥ 0 andP
dαd = 1 (e.g., α0d = 1/D); K0 ←P
dα0dKd
3: fort =1→T do
4: Randomly choosei ∈ {1, . . . ,n}
5: Assignment
ft(xi)←arg min
j=1,...,Mkxi−ptj−1k2H(Kt)
6: for allj=1→MdoRepresentation
7: γtj ←γjt−1+µtht(δ(ft(xi),j))
1i−γjt−1
8: end for
9: Kernel updateαtd ←αt−1d +νt∂E|∂αxi
d andKt ←P
dαtdKd
10: end for
MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, 2013 11 / 18
Outline
1 Introduction
2 MK-SOM
3 Applications
MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, 2013 12 / 18
Applications
Example1: simulated data
Graph with 200 nodes classified in 8 groups:
• graph: Erdös Reyni models: groups 1 to 4 and groups 5 to 8 with intra-group edge probability 0.3 and inter-group edge probability 0.01;
• numerical data: nodes labelled with 2-dimensional Gaussian vectors:
odd groupsN 0
0
!
, 0.3 0 0 0.3
!!
and even groupsN 1
1
!
, 0.3 0 0 0.3
!!
;
• factorwith two levels: groups 1, 2, 5 and 7: first level; other groups:
second level.
Only the knownledge on the three datasets can discriminate all 8 groups.
MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, 2013 13 / 18
Applications
Experiment
Kernels: graph: L+, numerical data: Gaussian kernel; factor: another Gaussian kernel on the disjunctive recoding
described:
• multiple kernel SOM with all three data;
• kernel SOM with a single dataset;
• kernel SOM with two datasets or all three datasets in a single (Gaussian) kernel.
MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, 2013 14 / 18
Applications
Experiment
Kernels: graph: L+, numerical data: Gaussian kernel; factor: another Gaussian kernel on the disjunctive recoding
Comparison: on 100 randomly generated datasets as previously described:
• multiple kernel SOM with all three data;
• kernel SOM with a single dataset;
• kernel SOM with two datasets or all three datasets in a single (Gaussian) kernel.
MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, 2013 14 / 18
An example
MK-SOM All in one kernel
Graph only Numerical variables only
MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, 2013 15 / 18
Applications
Numerical comparison I
(over 100 simulations) withmutual information
X
ij
|Ci∩C˜j|
200 log |Ci∩C˜j|
|Ci| × |C˜j|
(adjusted version, equal to 1 if partitions are identical[Danon et al., 2005]).
MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, 2013 16 / 18
Numerical comparison II
(over 100 simulations) withnode purity(average percentage of nodes in the neuron that come from the majority (original) class of the neuron).
MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, 2013 17 / 18
Applications
Conclusion
Summary
• integrating multiple sources information on SOM (e.g., finding communities in graph, while taking into account labels);
• uses multiple kernel and automatically tunes the combination;
• the method gives relevant communities according to all sources of information and a well-organized map;
• a similar approach also tested for relational SOM
([Massoni et al., 2013]for analyzing school-to-work transitions).
Possible developments
• Main issue: computational time; currently studying a sparse version. Thank you for your attention...
MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, 2013 18 / 18
Applications
Conclusion
Summary
• integrating multiple sources information on SOM (e.g., finding communities in graph, while taking into account labels);
• uses multiple kernel and automatically tunes the combination;
• the method gives relevant communities according to all sources of information and a well-organized map;
• a similar approach also tested for relational SOM
([Massoni et al., 2013]for analyzing school-to-work transitions).
Possible developments
• Main issue: computational time; currently studying a sparse version.
MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, 2013 18 / 18
Applications
Conclusion
Summary
• integrating multiple sources information on SOM (e.g., finding communities in graph, while taking into account labels);
• uses multiple kernel and automatically tunes the combination;
• the method gives relevant communities according to all sources of information and a well-organized map;
• a similar approach also tested for relational SOM
([Massoni et al., 2013]for analyzing school-to-work transitions).
Possible developments
• Main issue: computational time; currently studying a sparse version.
Thank you for your attention...
MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, 2013 18 / 18
References
Andras, P. (2002).
Kernel-Kohonen networks.
International Journal of Neural Systems, 12:117–135.
Aronszajn, N. (1950).
Theory of reproducing kernels.
Transactions of the American Mathematical Society, 68(3):337–404.
Danon, L., Diaz-Guilera, A., Duch, J., and Arenas, A. (2005).
Comparing community structure identification.
Journal of Statistical Mechanics, page P09008.
Fouss, F., Pirotte, A., Renders, J., and Saerens, M. (2007).
Random-walk computation of similarities between nodes of a graph, with application to collaborative recommendation.
IEEE Transactions on Knowledge and Data Engineering, 19(3):355–369.
Kondor, R. and Lafferty, J. (2002).
Diffusion kernels on graphs and other discrete structures.
InProceedings of the 19th International Conference on Machine Learning, pages 315–322.
Mac Donald, D. and Fyfe, C. (2000).
The kernel self organising map.
InProceedings of 4th International Conference on knowledge-based intelligence engineering systems and applied technologies, pages 317–320.
Massoni, S., Olteanu, M., and Villa-Vialaneix, N. (2013).
Which distance use when extracting typologies in sequence analysis? An application to school to work transitions.
InInternational Work Conference on Artificial Neural Networks (IWANN 2013), Puerto de la Cruz, Tenerife.
Forthcoming.
Rakotomamonjy, A., Bach, F., Canu, S., and Grandvalet, Y. (2008).
SimpleMKL.
MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, 2013 18 / 18
Applications Journal of Machine Learning Research, 9:2491–2521.
Smola, A. and Kondor, R. (2003).
Kernels and regularization on graphs.
In Warmuth, M. and Schölkopf, B., editors,Proceedings of the Conference on Learning Theory (COLT) and Kernel Workshop, Lecture Notes in Computer Science, pages 144–158.
Villa, N. and Rossi, F. (2007).
A comparison between dissimilarity SOM and kernel SOM for clustering the vertices of a graph.
In6th International Workshop on Self-Organizing Maps (WSOM), Bielefield, Germany. Neuroinformatics Group, Bielefield University.
Villa-Vialaneix, N., Olteanu, M., and Cierco-Ayrolles, C. (2013).
Carte auto-organisatrice pour graphes étiquetés.
InActes des Ateliers FGG (Fouille de Grands Graphes), colloque EGC (Extraction et Gestion de Connaissances), Toulouse, France.
Watkins, C. (2000).
Dynamic alignment kernels.
In Smola, A., Bartlett, P., Schölkopf, B., and Schuurmans, D., editors,Advances in Large Margin Classifiers, pages 39–50, Cambridge, MA, USA. MIT P.
MK-SOM (ESANN 2013) Nathalie Villa-Vialaneix Bruges, April 24th, 2013 18 / 18