TUTORIAL
Unsupervised Collaborative Learning
Nistor Grozavu
Computer Science Laboratory of Paris 13 University
Paris 13 University - Institut Galilée - LIPN, UMR 7030 du CNRS 99 Avenue J-B. Clément - 93430 Villetaneuse - France
Plan
n Introduction
n The problem
¨ Consenus Clustering & Collaborative Clustering
n Collaborative clustering
¨ Horizontal collaboration
¨ Vertical collaboration
n Topological Collaborative Clustering
n Diversity Analysis
¨ The problem
¨ Proposed solutions
n Real applications
n Conclusions
Introduction
Classification Clustering
The labels and the number of classes are known The labels and the number of classes are unknown
Difficulties
-
The objects are not labeled, …
- We need to use a similarity measure (for which variables?) - Do we need to know a priori the number of classes?
- How to characterize clusters?
1 2 3 … n
…
? ? ? … ?
…
Introduction : Clustering
n Grouping together of “similar” objects
n
Hard Clustering -- Each object belongs to a single cluster
n
Soft Clustering -- Each object is probabilistically assigned to clusters
In general, the formalization of the Clustering problem is determined by the following components:
Data representation (categorical, binary, graph…) The affinity measures (similarity, distance,..)
The objective function
The optimization procedure
Data distribution …
Introduction - Fusion vs Collaboration
1 1 1 1
i i i i
c c c c
!
2 2 2 2
i i i i
c c c c
!
Algo fusion
f i
f i
f i
f i
c c c c
!
DB DB
1 1 1 1
i i i i
c c c c
!
c i
c i
c i
c i
c c c c
!
DB1 DB2
C
1C
2C
fusC
1C
colThe principle of the Fusion The principle of the Collaboration
• Collaborate the datasets of different size;
• Use the same clustering method + a collaboration step;
• Use this schema for different datasets or for the
multi-views datasets;
Collaboration : principle
Collaboration
BD Infos
Modèles BD
The problem
n
The collaborative clustering is an emerging problem
n
Some works (fusion & collaboration) :
¨ Pedrycz & Rai 2008 (Collaboration);
¨ Costa da Silva & Klusch, 2006 (Collaboration);
¨ Wemmert & al., 2007 (Collaborative and Fusion);
¨ Cleuziou et al., 2009 (Horizontal Collaboration);
¨ Forestier et al., 2009 (Fusion/Collaboration);
¨ Grozavu et al., 2009 (Fusion, Collaboration);
¨ Strehl & Ghosh, 2002 (Fusion).
n
Collaborative Topological Learning uses the principle of the
Collaborative Fuzzy c-means (Pedrycz & Rai, 2002)
Strehl & Ghosh, 2002 (Fusion)
n
Compute the normalized mutual information (NMI) for each dataset ;
n
Compute the mean of the NMI for a set of r classes (labels) - ANMI;
Costa da Silva & Klusch, 2006 (Collaboration)
n
Distributed Data Clustering (DDC) :
¨ KDEC – Density estimation based Distributed Clustering;
¨ Compute the densities for each local DB:
¨ Send these densities to a « helper site » which will build the global clustering and send these information to other local sites.
Bennani et al., 2009 (Fusion)
n Fusion of several classifications using Relational Analysis
approach (AR)
Costa da Silva & Klusch, 2006 (Collaboration)
n
Distributed Data Clustering (DDC) :
¨ KDEC – Density estimation based Distributed Clustering;
¨ Compute the densities for each local DB:
¨ Send these densities to a « helper site » which will build the global clustering and send these information to other local sites.
Pedrycz & Rai 2008 (Collaboration)
n Fuzzy C-Means Clustering (FCM) :
n
For each dataset, build granular prototypes using the partitions matrix;
Collaborative Clustering
Three main types of collaboration : 1. Horizontal
All datasets are described by the same observations but in different spaces Of description (different variables).
2. Vertical
All the datasets have the same variables (same description space), but have different observations.
3. Hybrid
Combination between 1 & 2.
Nxd3 Nxd2
Nxd1
N
2xd N
4xd
N
1xd N
3xd
14
ID Att1 Att2 Att3 id1
id2 id3 id4 id5 id6 id7 id8 id9
ID Att4 Att5 id1
id2 id3 id4 id5 id6 id7 id8 id9
ID Att6 Att7 id1
id2 id3 id4 id5 id6 id7 id8 id9 …
Dataset [1] Dataset [2] Dataset [P]
Same samples
Horizontal collaboration
ID Att 1 Att 2 Att 3 Att 4 Id1
Id2 Id3 Id4
15
ID Att 1 Att 2 Att 3 Att 4
Id5 Id6 Id7
ID Att 1 Att 2 Att 3 Att 4
Id8 Id9
…
Dataset [1]
Dataset [2]
Dataset [P]
Same variables
Vertical Collaboration
The problem
Horizontal collaboration vs Vertical collaboration
Nxd
3Nxd
1Nxd
2The problem
n
How to improve the local clustering derived out of a set of distant clustering results without sharing the initial data ?
Nxd
3N
2xd
N
4xd Nxd
2Nxd
1N
1xd N
3xd
Horizontal Collaboration
Hybrid Collaboration
Vertical Collaboration
Collaborative FCM (Pedrycz, 2002)
Collaborative FCM (Pedrycz, 2002)
The distance function between the ith prototype and the kth pattern in the same
subset is denoted by d
ik2[ii], i = 1,2,...,c, k = 1,2,...,N and ii = 1, 2, . . . , P
Collaborative FCM (Pedrycz, 2002)
Each entry of the collaborative matrix describes the intensity of the interaction. In general, α[ii,kk] assumes nonnegative values.
Collaboration in the clustering scheme represented by the matrix of collaboration levels between the
subsets; the partition matrices generated for each data set are shown.
General collaborative clustering scheme (Pedrycz, 2002)
Quantification of the Collaborative Phenomenon of Clustering
(Pedrycz, 2002)The intensity of collaboration :
Uref[ii] to denote the partition matrix produced independently of other sources Consistency measure :
indicates the structural differences between the partition matrices defined over two data sets (ii and jj, respectively)
Topological Collaborative Clustering
Prototype based Clustering (SOM)
€
w
*= argmin
w
K
δ j,χ(xi )
( )
x
i− w
j 2j=1 w
∑
i=1 N
& ∑
' ( ) (
* + ( , (
€
χ ( ) x
i= argmin
j
x
i− w
j 2( )
€
K
δ( )i,j= exp − δ
2(i, j) T
2$
% & ' ( )
N : number of observations x, |W| : number of prototypes w Neighborhood function
Assignment function
Horizontal Collaboration
€
w
*= argmin
w
R
SOM[ii]( χ , w) + R
Col[ii]_ H( χ , w)
{ }
€
R
Col_H[ii]
( χ , w) = α
[[ii]jj]jj=1, jj≠ii P
∑ ( K
δ[ii]( j,χ(xi ))− K
δ[(jjj,]χ(xi )))
2x
[ii]i− w
[iij ] 2j=1 w
∑
i=1 N
∑
€
RSOM[ii] (
χ
,w)= Kδ j,χ(xi)
( )
[ii] x
i
[ii]−w
j
[ii] 2 j=1
w
∑
i=1 N
∑
N[ii] : the number of observations on the datasets [ii], P : the number of datasets
Collaboration term
Collaboration coefficient
Learning Algorithm
Vertical Collaboration
€
RSOM[ii] (
χ
,w)= Kδ j,χ(xi)
( )
[ii] x
i
[ii]−w
j
[ii] 2 j=1
w
∑
i=1 N
∑
N[ii] : the number of observations on dataset [ii], P : the number of datasets sites
Collaboration term
€
w
*= argmin
w
R
SOM[ii]( χ , w) + R
Col[ii]_V( χ , w)
{ }
€
R
Col_V[ii]
( χ ,w ) = α
[ii[jj]]jj=1, jj≠ii P
∑ K
δ (j,χ (xi))[ii]
− K
δ j,χ (xi)
( )
[jj]
( )
2w
[iij ]− w
[iij ] 2j=1 w
∑
i=1 N[ii]
∑
Collaboration coefficient
Learning algorithm
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -1.5
-1 -0.5 0 0.5 1
Ilustrative example
n Waveform dataset
n
5000 samples
n
40 variables where 19 variables are Gaussian noisy
n
3 classes
Horizontal Collaboration (waveform)
I II III IV I
II III
IV
The prototypes of the 1st dataset before the collaboration : SOM1
75.71%
The prototypes of the 2nd dataset before the collaboration : SOM2
79.61%
The prototypes of the 1st dataset after the collaboration with the SOM2 map : SOM12
76.21%
Horizontal Collaboration
(waveform)Horizontal Collaboration
(waveform)The prototypes of the 1st map obtained from the 1st dataset before the collaboration : SOM1
75.71%
The prototypes of the map from the 3rd dataset before the collaboration : SOM3
47.19%
The prototypes of the map obtained from the 1st dataset after the collaboration with SOM3 : SOM13
62.47%
The prototypes of the map obtained from the 3rd dataset after the collaboration with SOM1 : SOM31
54.63%
Validation de l ’ approche collaboratif sur différents bases de données
Collaboration horizontale Collaboration verticale
Probabilistic Collaborative Clustering
Probabilistic Clustering
Generative Topographic Mapping [Bishop 95]
E & M steps
Topological Collaborative Clustering
n Prototype based Clustering
n Probabilistic Clustering
Collaborative Clustering : local step + collaboration step
Collaborative Generative Topographic Mapping
Some experimental results
Collaborative Clustering
Diversity analysis
Diversity : why?
Correct Wrong
Studied in Consensus clustering
Dataset X containing 15 samples
Algo1 Algo2 Algo3
accuracy
10/15 = 0.667 10/15 = 0.667 10/15 = 0.667
Algo1 Algo3
Nxd1
Γ
Algo2
Diversity : why?
Studied in Consensus clustering
Dataset X containing 15 samples
Algo1 Algo2 Algo3
accuracy
10/15 = 0.667 10/15 = 0.667 10/15 = 0.667
11/15 = 0.773
Fusion Majority vote rule
Correct Wrong
Algo1 Algo3
Nxd1
Γ
Algo2
Diversity : why?
Studied in Consensus clustering
Dataset X containing 15 samples
Algo1 Algo2 Algo3
accuracy
10/15 = 0.667 10/15 = 0.667 10/15 = 0.667
11/15 = 0.773
10/15 = 0.667 10/15 = 0.667 10/15 = 0.667 Algo1
Algo2 Algo3
Fusion Majority vote rule
Correct Wrong
Algo1 Algo3
Nxd1
Γ
Algo2
Diversity : why?
Studied in Consensus clustering
Dataset X containing 15 samples
Algo1 Algo2 Algo3
accuracy
10/15 = 0.667 10/15 = 0.667 10/15 = 0.667
11/15 = 0.773
10/15 = 0.667 10/15 = 0.667 10/15 = 0.667
10/15 = 0.667
Algo1 Algo2 Algo3
Fusion
Fusion
Majority vote rule
Majority vote rule
Correct Wrong
Algo1 Algo3
Nxd1
Γ
Algo2
Diversity : why?
Studied in Consensus clustering
Dataset X containing 15 samples
Algo1 Algo2 Algo3
accuracy
10/15 = 0.667 10/15 = 0.667 10/15 = 0.667
11/15 = 0.773
10/15 = 0.667 10/15 = 0.667 10/15 = 0.667
10/15 = 0.667
10/15 = 0.667 10/15 = 0.667 10/15 = 0.667 Algo1
Algo2 Algo3
Algo1 Algo2 Algo3
Fusion
Fusion
Majority vote rule
Majority vote rule
Correct Wrong
Algo1 Algo3
Nxd1
Γ
Algo2
Diversity : why?
Studied in Consensus clustering
Dataset X containing 15 samples
Algo1 Algo2 Algo3
accuracy
10/15 = 0.667 10/15 = 0.667 10/15 = 0.667
11/15 = 0.773
10/15 = 0.667 10/15 = 0.667 10/15 = 0.667
10/15 = 0.667
10/15 = 0.667 10/15 = 0.667 10/15 = 0.667
8/15 = 0.533
Algo1 Algo2 Algo3
Algo1 Algo2 Algo3
Fusion
Fusion
Fusion
Majority vote rule
Majority vote rule
Majority vote rule
Correct Wrong
Algo1 Algo3
Nxd1
Γ
Algo2
Diversity (2)
Correct Wrong
Collaborative clustering
Dataset X1 containing 15 samples Dataset X2 containing 15 samples Dataset X3 containing 15 samples
X1 X2
accuracy
8/15 = 0.533 12/15 = 0.8
10/15 = 0.6 11/15 = 0.733 GTM1<-2
GTM2<-1 X3
GTM3<-2
X1 X2 X3
11/15 = 0.733
12/15 = 0.8
Diversity measures
index formula
Rand index
Adjusted Rand index Jaccard index
Wallace’s coefficient Adjusted Wallace index
Normalized Mutual Information
Variation of Information
Diversity measures on waveform datasets
Diversity (2)
Correct Wrong
Collaborative clustering
Dataset X1 containing 15 samples Dataset X2 containing 15 samples Dataset X3 containing 15 samples
X1 X2
accuracy
8/15 = 0.533 12/15 = 0.8
10/15 = 0.6 11/15 = 0.733 GTM1<-2
GTM2<-1 X3
GTM3<-2
X1 X2 X3
11/15 = 0.733
12/15 = 0.8
diversity
X1-X2 =0.956 X2-X3 = 0.678
Need to study the local quality.
Results : 10 waveform sub-sets
The plot of diversity and the accuracy difference after collaboration
Results : 1-1.000 waveform sub-sets
Waveform datasets: Collaboration results between a fixed subset and 1000 randomly subsets (axe X represents the Diversity and axe Y - the Accuracy gain)
Collaboration results (1)
Collaboration results between a fixed subset and 1000 randomly subsets
axe X represents the Diversity and axe Y - the Accuracy gain
Collaboration results (2)
axe X represents the Diversity and axe Y - the Accuracy gain
Collaboration results between a fixed subset and 1000 randomly subsets
Collaborative Clustering and Consensus Clustering
real applications
Images : Strasbourg satellite image (1) Projet COCLICO
The authors would like to thank CESBIO (Danielle Ducrot, ClaireMarais-Sicre, Olivier Hagolle, Mireille Huc and Jordi Inglada) for providing the land-cover maps and the geometrically and radiometrically corrected Formosat-2 images.
Images : Strasbourg satellite image (2) Projet COCLICO
Before collaboration After collaboration
System for searching visual
information
Multimodal information : prediction
Application context:
• A Wikipedia dataset containing a set of 20.000 images from wikipedia and ttransformed into numerical values by Xerox research center.
Resulats:
n Reduce the dimensionality of this dataset of size 20.000 x 12.800 into 20.000 x 10
n Patent (THALES, Paris 13 University) N°: 08 06947
Inventors: BENHADDA H., BENNANI Y., LEBBAH M., GROZAVU N.
Recall : topological learning
( ) ∑∑
( )= =
− Κ
= ℜ
N i
W j
j i x j
SOM
W
ix w
1 1
2
,
,χχ
j
nj j j
w w w w
!!
!!
!
"
#
$$
$$
$
%
&
!
2 1
Conventional search engine
Research time: 0,11s ;
Browsing the collection of images by user
: 15 days( 13.700.000 images / 21 images/pages = 652.380 pages * 2 s = 1.304.760s (362h or 15 days) )
Example of image search using a traditional search engine
(France flag)
Map of images
Wikipedia (19.000 x 6400) x 2
Hierarchical SOM (3D view)
( M
d) = ( x
i− w
j 2)
χ
Intelligent system based on the Topological Learning
colours
texture
words Feature extraction from images
6.400
6.400
2.485
Principle
Relational Analysis
(Marcotorchino al. 1980)0 2
3
3 2 2 0 1 0
2 3 0 0 1
2 1 3 0 0
0 0 0 3 2 1
1 0 0 2 0
0 1 0 0 0
A B C D E F
3 A
B C D E F
R
A
01
1
1 1 1 0 0 0
1 1 0 0 0
1 1 1 0 0
0 0 0 1 1 0
0 0 0 1 0
0 0 0 0 0
A B C D E F
1 A
B C D E F 1
1 1 2 2 3
A B
C
D E F
0
1 0 0
1 0
1 0 0
0 1 0
0 1 0
0 0 1
A B
C
D E F
0 1
1
1 1 1 0 0 0
1 1 0 0 0
1 1 1 0 0
0 0 0 1 1 0
0 0 0 1 0
0 0 0 0 0
A B C D E F
1 A
B
C
D E F
v1 v2 v3
Linear coding Complete Disjonctive Coding Relational Coding v
We denote F(R, X) - the linear criterion measuring the adequacy between the data R and the solution X, the mathematical formulation of the problem is:
max F(R, X)
X
Under the linear constraints generated by the properties of X.
1- Pairwise comparison principle
2- (0,1) Linear programming modelling
Coding et Fusion
Dimensionality reduction
New image assignment
( ) x
i= arg min
j( xi − w
j 2 )
χ
Demonstration video
BREVET (THALES, Université Paris 13 ) N°: 08 06947
BENHADDA H., BENNANI Y., LEBBAH M., GROZAVU N. «SYSTEME DE RECHERCHE D'INFORMATION VISUELLE», BREVET 08 06947.
LEBBAH M., BENNANI Y., BENHADDA H., GROZAVU N., (2009), «Relational Analysis for Clustering Consensus», Invited Book Chapter, Machine Learning, ISBN 978-953-7619-X-X, IN-TECH Publisher.
Conclusions
n
The collaborative clustering allows:
¨ An interaction between different datasets
¨ Reveal underlying structures and patterns within data sets.
n
During the collaboration step, where is no need of data, the algorithm requires only the clustering results of other datasets.
¨ obtain a new classification that is as close as possible to that which would have obtained if we had centralized datasets and then make a partition.
n
The quality of the local clustering algorithm is very important for the collaboration’s quality improvement regarding the diversity index
¨ Overall, the variability of the collaboration’s quality increase with the diversity
n
Create a «helper site» which will build the global clustering and send these information to other local sites
n