TUTORIAL

(1)

TUTORIAL

Unsupervised Collaborative Learning

Nistor Grozavu

Computer Science Laboratory of Paris 13 University

Paris 13 University - Institut Galilée - LIPN, UMR 7030 du CNRS 99 Avenue J-B. Clément - 93430 Villetaneuse - France

(2)

Plan

n  Introduction

n  The problem

¨  Consenus Clustering & Collaborative Clustering

n  Collaborative clustering

¨  Horizontal collaboration

¨  Vertical collaboration

n  Topological Collaborative Clustering

n  Diversity Analysis

¨  The problem

¨  Proposed solutions

n  Real applications

n  Conclusions

(3)

Introduction

Classification Clustering

The labels and the number of classes are known The labels and the number of classes are unknown

Difficulties

- 

The objects are not labeled, …

-  We need to use a similarity measure (for which variables?) -  Do we need to know a priori the number of classes?

-  How to characterize clusters?

1 2 3 … n

…

? ? ? … ?

…

(4)

Introduction : Clustering

n   Grouping together of “similar” objects

n 

Hard Clustering -- Each object belongs to a single cluster

n 

Soft Clustering -- Each object is probabilistically assigned to clusters

In general, the formalization of the Clustering problem is determined by the following components:

Data representation (categorical, binary, graph…) The affinity measures (similarity, distance,..)

The objective function

The optimization procedure

Data distribution …

(5)

Introduction - Fusion vs Collaboration

1 1 1 1

i i i i

c c c c

!

2 2 2 2

i i i i

c c c c

!

Algo fusion

f i

c c c c

!

DB DB

1 1 1 1

i i i i

c c c c

!

c i

c c c c

!

DB1 DB2

C

₁

C

₂

C

_fus

C

₁

C

_col

The principle of the Fusion The principle of the Collaboration

•  Collaborate the datasets of different size;

•  Use the same clustering method + a collaboration step;

•  Use this schema for different datasets or for the

multi-views datasets;

(6)

Collaboration : principle

Collaboration

BD Infos

Modèles BD

(7)

The problem

n 

The collaborative clustering is an emerging problem

n 

Some works (fusion & collaboration) :

¨  Pedrycz & Rai 2008 (Collaboration);

¨  Costa da Silva & Klusch, 2006 (Collaboration);

¨  Wemmert & al., 2007 (Collaborative and Fusion);

¨  Cleuziou et al., 2009 (Horizontal Collaboration);

¨  Forestier et al., 2009 (Fusion/Collaboration);

¨  Grozavu et al., 2009 (Fusion, Collaboration);

¨  Strehl & Ghosh, 2002 (Fusion).

n 

Collaborative Topological Learning uses the principle of the

Collaborative Fuzzy c-means (Pedrycz & Rai, 2002)

(8)

Strehl & Ghosh, 2002 (Fusion)

n 

Compute the normalized mutual information (NMI) for each dataset ;

n 

Compute the mean of the NMI for a set of r classes (labels) - ANMI;

(9)

Costa da Silva & Klusch, 2006 (Collaboration)

n 

Distributed Data Clustering (DDC) :

¨  KDEC – Density estimation based Distributed Clustering;

¨  Compute the densities for each local DB:

¨  Send these densities to a « helper site » which will build the global clustering and send these information to other local sites.

(10)

Bennani et al., 2009 (Fusion)

n  Fusion of several classifications using Relational Analysis

approach (AR)

(11)

Costa da Silva & Klusch, 2006 (Collaboration)

n 

Distributed Data Clustering (DDC) :

¨  KDEC – Density estimation based Distributed Clustering;

¨  Compute the densities for each local DB:

¨  Send these densities to a « helper site » which will build the global clustering and send these information to other local sites.

(12)

Pedrycz & Rai 2008 (Collaboration)

n  Fuzzy C-Means Clustering (FCM) :

n 

For each dataset, build granular prototypes using the partitions matrix;

(13)

Collaborative Clustering

Three main types of collaboration : 1. Horizontal

All datasets are described by the same observations but in different spaces Of description (different variables).

2. Vertical

All the datasets have the same variables (same description space), but have different observations.

3. Hybrid

Combination between 1 & 2.

Nxd₃ Nxd₂

Nxd₁

N

₂

xd N

₄

xd

N

₁

xd N

₃

xd

(14)

14

ID Att1 Att2 Att3 id1

id2 id3 id4 id5 id6 id7 id8 id9

ID Att4 Att5 id1

id2 id3 id4 id5 id6 id7 id8 id9

ID Att6 Att7 id1

id2 id3 id4 id5 id6 id7 id8 id9 …

Dataset [1] Dataset [2] Dataset [P]

Same samples

Horizontal collaboration

(15)

ID Att 1 Att 2 Att 3 Att 4 Id1

Id2 Id3 Id4

15

ID Att 1 Att 2 Att 3 Att 4

Id5 Id6 Id7

ID Att 1 Att 2 Att 3 Att 4

Id8 Id9

…

Dataset [1]

Dataset [2]

Dataset [P]

Same variables

Vertical Collaboration

(16)

The problem

Horizontal collaboration vs Vertical collaboration

Nxd

₃

Nxd

₁

Nxd

₂

(17)

The problem

n 

How to improve the local clustering derived out of a set of distant clustering results without sharing the initial data ?

Nxd

₃

N

₂

xd

N

₄

xd Nxd

₂

Nxd

₁

N

₁

xd N

₃

xd

Horizontal Collaboration

Hybrid Collaboration

Vertical Collaboration

(18)

Collaborative FCM (Pedrycz, 2002)

(19)

Collaborative FCM (Pedrycz, 2002)

The distance function between the ith prototype and the kth pattern in the same

subset is denoted by d

_ik²

[ii], i = 1,2,...,c, k = 1,2,...,N and ii = 1, 2, . . . , P

(20)

Collaborative FCM (Pedrycz, 2002)

Each entry of the collaborative matrix describes the intensity of the interaction. In general, α[ii,kk] assumes nonnegative values.

Collaboration in the clustering scheme represented by the matrix of collaboration levels between the

subsets; the partition matrices generated for each data set are shown.

(21)

General collaborative clustering scheme (Pedrycz, 2002)

(22)

Quantification of the Collaborative Phenomenon of Clustering

(Pedrycz, 2002)

The intensity of collaboration :

Uref[ii] to denote the partition matrix produced independently of other sources Consistency measure :

indicates the structural differences between the partition matrices defined over two data sets (ii and jj, respectively)

(23)

Topological Collaborative Clustering

(24)

Prototype based Clustering (SOM)

€

w

^*

= argmin

w

K

_δ _j_,_χ_(x

i )

( )

x

_i

− w

_j ²

j=1 w

∑

i=1 N

& ∑

' ( ) (

* + ( , (

€

χ ( ) x

_i

⁼ ^argmin

j

x

_i

− w

_j ²

( )

€

K

_δ_{( )}_i,_j

= exp − δ

²

^(i, ^j) T

²

$

% & ' ( )

N : number of observations x, |W| : number of prototypes w Neighborhood function

Assignment function

(25)

Horizontal Collaboration

€

w

^*

= argmin

w

R

_SOM^[^ii]

( χ , w) + R

_Col^[ii^]_{_} _H

( χ , w)

{ }

€

R

Col_H

[ii]

( χ , w) = α

_[^[_ii]^jj^]

jj=1, jj≠ii P

∑ ( ^K

^δ^[^ii]⁽ ^j,^χ^(xⁱ ⁾⁾

⁻ ^K

^δ^[⁽^jj^j,^]^χ^(xⁱ ⁾⁾

)

²

^x

^[ii]ⁱ

⁻ ^w

^[ii^j ^] ²

j=1 w

∑

i=1 N

∑

€

R_SOM^[^ii] (

χ

^,^w)= K_δ _j,_χ₍_x

i)

( )

[ii] x

i

[ii]−w

j

[ii] 2 j=1

w

∑

i=1 N

∑

N^[ii] : the number of observations on the datasets [ii], P : the number of datasets

Collaboration term

Collaboration coefficient

(26)

Learning Algorithm

(27)

Vertical Collaboration

€

R_SOM^[^ii] (

χ

^,^w)= K_δ _j,_χ₍_x

i)

( )

[ii] x

i

[ii]−w

j

[ii] 2 j=1

w

∑

i=1 N

∑

N^[ii] : the number of observations on dataset [ii], P : the number of datasets sites

Collaboration term

€

w

^*

= argmin

w

R

_SOM^[ii^]

( χ , w) + R

_Col^[^ii]_{_}_V

( χ , w)

{ }

€

R

Col_V

[ii]

( χ ,w ) = α

_[ii^[^jj_]^]

jj=1, jj≠ii P

∑ ^K

^δ (^j,^χ ^(xi))

[ii]

− K

_δ _j,_χ _(x

i)

( )

[jj]

( )

²

^w

^[ii^j ^]

⁻ ^w

^[ii^j ^] ²

j=1 w

∑

i=1 N^[ii^]

∑

Collaboration coefficient

(28)

Learning algorithm

(29)

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -1.5

-1 -0.5 0 0.5 1

Ilustrative example

n  Waveform dataset

n 

5000 samples

n 

40 variables where 19 variables are Gaussian noisy

n 

3 classes

(30)

Horizontal Collaboration (waveform)

I II III IV I

II III

IV

(31)

The prototypes of the 1st dataset before the collaboration : SOM1

75.71%

The prototypes of the 2^nd dataset before the collaboration : SOM2

79.61%

The prototypes of the 1st dataset after the collaboration with the SOM2 map : SOM12

76.21%

Horizontal Collaboration

^(waveform)

(32)

Horizontal Collaboration

(waveform)

The prototypes of the 1st map obtained from the 1st dataset before the collaboration : SOM1

75.71%

The prototypes of the map from the 3rd dataset before the collaboration : SOM3

47.19%

The prototypes of the map obtained from the 1st dataset after the collaboration with SOM3 : SOM13

62.47%

The prototypes of the map obtained from the 3rd dataset after the collaboration with SOM1 : SOM31

54.63%

(33)

Validation de l ’ approche collaboratif sur différents bases de données

Collaboration horizontale Collaboration verticale

(34)

Probabilistic Collaborative Clustering

(35)

Probabilistic Clustering

Generative Topographic Mapping [Bishop 95]

(36)

E & M steps

(37)

Topological Collaborative Clustering

n  Prototype based Clustering

n  Probabilistic Clustering

Collaborative Clustering : local step + collaboration step

(38)

Collaborative Generative Topographic Mapping

(39)

Some experimental results

(40)

Collaborative Clustering

Diversity analysis

(41)

Diversity : why?

Correct Wrong

Studied in Consensus clustering

Dataset X containing 15 samples

Algo1 Algo2 Algo3

accuracy

10/15 = 0.667 10/15 = 0.667 10/15 = 0.667

Algo1 Algo3

Nxd₁

Γ

Algo2

(42)

Diversity : why?

Studied in Consensus clustering

accuracy

10/15 = 0.667 10/15 = 0.667 10/15 = 0.667

11/15 = 0.773

Fusion Majority vote rule

Correct Wrong

Algo1 Algo3

Nxd₁

Γ

Algo2

(43)

Diversity : why?

Studied in Consensus clustering

accuracy

10/15 = 0.667 10/15 = 0.667 10/15 = 0.667

11/15 = 0.773

10/15 = 0.667 10/15 = 0.667 10/15 = 0.667 Algo1

Algo2 Algo3

Fusion Majority vote rule

Correct Wrong

Algo1 Algo3

Nxd₁

Γ

Algo2

(44)

Diversity : why?

Studied in Consensus clustering

accuracy

10/15 = 0.667 10/15 = 0.667 10/15 = 0.667

11/15 = 0.773

10/15 = 0.667 10/15 = 0.667 10/15 = 0.667

10/15 = 0.667

Fusion

Majority vote rule

Correct Wrong

Algo1 Algo3

Nxd₁

Γ

Algo2

(45)

Diversity : why?

Studied in Consensus clustering

accuracy

10/15 = 0.667 10/15 = 0.667 10/15 = 0.667

11/15 = 0.773

10/15 = 0.667 10/15 = 0.667 10/15 = 0.667

10/15 = 0.667

10/15 = 0.667 10/15 = 0.667 10/15 = 0.667 Algo1

Algo2 Algo3

Fusion

Majority vote rule

Correct Wrong

Algo1 Algo3

Nxd₁

Γ

Algo2

(46)

Diversity : why?

Studied in Consensus clustering

accuracy

10/15 = 0.667 10/15 = 0.667 10/15 = 0.667

11/15 = 0.773

10/15 = 0.667 10/15 = 0.667 10/15 = 0.667

10/15 = 0.667

10/15 = 0.667 10/15 = 0.667 10/15 = 0.667

8/15 = 0.533

Fusion

Majority vote rule

Correct Wrong

Algo1 Algo3

Nxd₁

Γ

Algo2

(47)

Diversity (2)

Correct Wrong

Collaborative clustering

Dataset X1 containing 15 samples Dataset X2 containing 15 samples Dataset X3 containing 15 samples

X1 X2

accuracy

8/15 = 0.533 12/15 = 0.8

10/15 = 0.6 11/15 = 0.733 GTM1<-2

GTM2<-1 X3

GTM3<-2

X1 X2 X3

11/15 = 0.733

12/15 = 0.8

(48)

Diversity measures

index formula

Rand index

Adjusted Rand index Jaccard index

Wallace’s coefficient Adjusted Wallace index

Normalized Mutual Information

Variation of Information

(49)

Diversity measures on waveform datasets

(50)

Diversity (2)

Correct Wrong

Collaborative clustering

Dataset X1 containing 15 samples Dataset X2 containing 15 samples Dataset X3 containing 15 samples

X1 X2

accuracy

8/15 = 0.533 12/15 = 0.8

10/15 = 0.6 11/15 = 0.733 GTM1<-2

GTM2<-1 X3

GTM3<-2

X1 X2 X3

11/15 = 0.733

12/15 = 0.8

diversity

X1-X2 =0.956 X2-X3 = 0.678

Need to study the local quality.

(51)

Results : 10 waveform sub-sets

The plot of diversity and the accuracy difference after collaboration

(52)

Results : 1-1.000 waveform sub-sets

Waveform datasets: Collaboration results between a fixed subset and 1000 randomly subsets (axe X represents the Diversity and axe Y - the Accuracy gain)

(53)

Collaboration results (1)

Collaboration results between a fixed subset and 1000 randomly subsets

axe X represents the Diversity and axe Y - the Accuracy gain

(54)

Collaboration results (2)

axe X represents the Diversity and axe Y - the Accuracy gain

Collaboration results between a fixed subset and 1000 randomly subsets

(55)

Collaborative Clustering and Consensus Clustering

real applications

(56)

Images : Strasbourg satellite image (1) Projet COCLICO

The authors would like to thank CESBIO (Danielle Ducrot, ClaireMarais-Sicre, Olivier Hagolle, Mireille Huc and Jordi Inglada) for providing the land-cover maps and the geometrically and radiometrically corrected Formosat-2 images.

(57)

Images : Strasbourg satellite image (2) Projet COCLICO

Before collaboration After collaboration

(58)

System for searching visual

information

(59)

Multimodal information : prediction

Application context:

•  A Wikipedia dataset containing a set of 20.000 images from wikipedia and ttransformed into numerical values by Xerox research center.

Resulats:

n  Reduce the dimensionality of this dataset of size 20.000 x 12.800 into 20.000 x 10

n  Patent (THALES, Paris 13 University) N°: 08 06947

Inventors: BENHADDA H., BENNANI Y., LEBBAH M., GROZAVU N.

(60)

Recall : topological learning

( ) ∑∑

_{( )}

= =

− Κ

= ℜ

N i

W j

j i x j

SOM

W

_i

x w

1 1

2

,

,_χ

χ

j

nj j j

w w w w

!!

!

"

#

$$

$

%

&

!

2 1

(61)

Conventional search engine

Research time: 0,11s ;

Browsing the collection of images by user

: 15 days

( 13.700.000 images / 21 images/pages = 652.380 pages * 2 s = 1.304.760s (362h or 15 days) )

Example of image search using a traditional search engine

(France flag)

(62)

Map of images

Wikipedia (19.000 x 6400) x 2

(63)

Hierarchical SOM (3D view)

( ^M

^d

) ⁼ ( ^x

ⁱ

⁻ ^w

^j ²

)

χ

(64)

Intelligent system based on the Topological Learning

colours

texture

words Feature extraction from images

6.400

2.485

(65)

Principle

(66)

Relational Analysis

(Marcotorchino al. 1980)

0 2

3

3 2 2 0 1 0

2 3 0 0 1

2 1 3 0 0

0 0 0 3 2 1

1 0 0 2 0

0 1 0 0 0

A B C D E F

3 A

B C D E F

R

A

⁰

1

1 1 1 0 0 0

1 1 0 0 0

1 1 1 0 0

0 0 0 1 1 0

0 0 0 1 0

0 0 0 0 0

A B C D E F

1 A

B C D E F 1

1 1 2 2 3

A B

C

D E F

0

1 0 0

1 0

1 0 0

0 1 0

0 0 1

A B

C

D E F

0 1

1

1 1 1 0 0 0

1 1 0 0 0

1 1 1 0 0

0 0 0 1 1 0

0 0 0 1 0

0 0 0 0 0

A B C D E F

1 A

B

C

D E F

v1 v2 v3

Linear coding Complete Disjonctive Coding Relational Coding v

We denote F(R, X) - the linear criterion measuring the adequacy between the data R and the solution X, the mathematical formulation of the problem is:

max F(R, X)

X

Under the linear constraints generated by the properties of X.

1- Pairwise comparison principle

2- (0,1) Linear programming modelling

(67)

Coding et Fusion

(68)

Dimensionality reduction

(69)

New image assignment

( ) ^x

ⁱ

⁼ ^arg ^min

_j

( ^x

ⁱ

⁻ ^w

^j ²

)

χ

(70)

Demonstration video

BREVET (THALES, Université Paris 13 ) N°: 08 06947

BENHADDA H., BENNANI Y., LEBBAH M., GROZAVU N. «SYSTEME DE RECHERCHE D'INFORMATION VISUELLE», BREVET 08 06947.

LEBBAH M., BENNANI Y., BENHADDA H., GROZAVU N., (2009), «Relational Analysis for Clustering Consensus», Invited Book Chapter, Machine Learning, ISBN 978-953-7619-X-X, IN-TECH Publisher.

(71)

Conclusions

n 

The collaborative clustering allows:

¨  An interaction between different datasets

¨  Reveal underlying structures and patterns within data sets.

n 

During the collaboration step, where is no need of data, the algorithm requires only the clustering results of other datasets.

¨  obtain a new classification that is as close as possible to that which would have obtained if we had centralized datasets and then make a partition.

n 

The quality of the local clustering algorithm is very important for the collaboration’s quality improvement regarding the diversity index

¨  Overall, the variability of the collaboration’s quality increase with the diversity

n 

Create a «helper site» which will build the global clustering and send these information to other local sites

n 

TUTORIAL

TUTORIAL

Unsupervised Collaborative Learning

Nistor Grozavu

Plan

n Introduction

n The problem

n Collaborative clustering

n Topological Collaborative Clustering

n Diversity Analysis

n Real applications

n Conclusions

Introduction

Classification Clustering

Difficulties

The objects are not labeled, …

- We need to use a similarity measure (for which variables?) - Do we need to know a priori the number of classes?

- How to characterize clusters?

1 2 3 … n

…

? ? ? … ?

…

Introduction : Clustering

n Grouping together of “similar” objects

Hard Clustering -- Each object belongs to a single cluster

Soft Clustering -- Each object is probabilistically assigned to clusters

In general, the formalization of the Clustering problem is determined by the following components:

Data representation (categorical, binary, graph…) The affinity measures (similarity, distance,..)

The objective function

The optimization procedure

Data distribution …

Introduction - Fusion vs Collaboration

Algo fusion

C

C

C

C

C

The principle of the Fusion The principle of the Collaboration

• Collaborate the datasets of different size;

• Use the same clustering method + a collaboration step;

• Use this schema for different datasets or for the

multi-views datasets;

Collaboration : principle

Collaboration

BD Infos

Modèles BD

The problem

The collaborative clustering is an emerging problem

Some works (fusion & collaboration) :

Collaborative Topological Learning uses the principle of the

Collaborative Fuzzy c-means (Pedrycz & Rai, 2002)

Strehl & Ghosh, 2002 (Fusion)

Compute the normalized mutual information (NMI) for each dataset ;

Compute the mean of the NMI for a set of r classes (labels) - ANMI;

Costa da Silva & Klusch, 2006 (Collaboration)

Distributed Data Clustering (DDC) :

Bennani et al., 2009 (Fusion)

n Fusion of several classifications using Relational Analysis

approach (AR)

Costa da Silva & Klusch, 2006 (Collaboration)

Distributed Data Clustering (DDC) :

Pedrycz & Rai 2008 (Collaboration)

n Fuzzy C-Means Clustering (FCM) :

For each dataset, build granular prototypes using the partitions matrix;

Collaborative Clustering

Three main types of collaboration : 1. Horizontal

2. Vertical

3. Hybrid

N

xd N

xd

N

xd N

xd

ID Att1 Att2 Att3 id1

id2 id3 id4 id5 id6 id7 id8 id9

ID Att4 Att5 id1

id2 id3 id4 id5 id6 id7 id8 id9

ID Att6 Att7 id1

n  Introduction

n  The problem

n  Collaborative clustering

n  Topological Collaborative Clustering

n  Diversity Analysis

n  Real applications

n  Conclusions

-  We need to use a similarity measure (for which variables?) -  Do we need to know a priori the number of classes?

-  How to characterize clusters?

n   Grouping together of “similar” objects

•  Collaborate the datasets of different size;

•  Use the same clustering method + a collaboration step;

•  Use this schema for different datasets or for the

n  Fusion of several classifications using Relational Analysis

n  Fuzzy C-Means Clustering (FCM) :

⁼ ^argmin

^(i, ^j) T