• Aucun résultat trouvé

Entropy Based Probabilistic Collaborative Clustering

N/A
N/A
Protected

Academic year: 2021

Partager "Entropy Based Probabilistic Collaborative Clustering"

Copied!
16
0
0

Texte intégral

(1)

HAL Id: hal-02480318

https://hal.archives-ouvertes.fr/hal-02480318

Submitted on 15 Feb 2020

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Bennani, Antoine Cornuéjols

To cite this version:

Jeremie Sublime, Matei Basarab, Guénaël Cabanes, Nistor Grozavu, Younès Bennani, et al.. Entropy Based Probabilistic Collaborative Clustering. Pattern Recognition, Elsevier, 2017, 72, pp.144-157.

�10.1016/j.patcog.2017.07.014�. �hal-02480318�

(2)

Entropy Based Probabilistic Collaborative Clustering

Article in Pattern Recognition · December 2017

DOI: 10.1016/j.patcog.2017.07.014

CITATIONS

0

READS

40

6 authors, including:

Some of the authors of this publication are also working on these related projects:

COCLICO (ANR Project) View project

Clustering in Dynamic Data , Detection Concept Change in Dynamic Data View project Basarab Matei

Université Paris 13 Nord

42PUBLICATIONS

356CITATIONS

SEE PROFILE

Guénaël Cabanes Université Paris 13 Nord

55PUBLICATIONS

205CITATIONS

SEE PROFILE

Nistor Grozavu

Université Paris 13 Nord

52PUBLICATIONS

82CITATIONS

SEE PROFILE

Younès Bennani Université Paris 13 Nord

185PUBLICATIONS

810CITATIONS

SEE PROFILE

All content following this page was uploaded by Jeremie Sublime on 08 August 2017.

The user has requested enhancement of the downloaded file.

(3)

ContentslistsavailableatScienceDirect

Pattern Recognition

journalhomepage:www.elsevier.com/locate/patcog

Entropy based probabilistic collaborative clustering

Jérémie Sublime

a,b,

, Basarab Matei

b

, Guénaël Cabanes

b

, Nistor Grozavu

b

,

Q1

Younès Bennani

b

, Antoine Cornuéjols

c

aLISITE Laboratory, RDI Team - ISEP 10 rue de Vanves, 92130 Issy Les Moulineaux, France

bUniversité Paris 13, Sorbonne Paris Cité, LIPN - CNRS UMR 7030 99 av. J-B Clément, 93430 Villetaneuse, France

cUMR MIA-Paris, AgroParisTech, INRA Université Paris-Saclay, 75005 Paris, France

a rt i c l e i n f o

Article history:

Received 17 December 2016 Revised 24 April 2017 Accepted 8 July 2017 Available online xxx Keywords:

Collaborative clustering EM algorithms Entropy based methods

a b s t r a c t

Unsupervisedmachinelearningapproachesinvolvingseveralclustering algorithmsworkingtogetherto tackledifficultdatasetsarearecentareaofresearchwithalargenumberofapplicationssuchascluster- ingofdistributeddata,multi-expert clustering,multi-scaleclusteringanalysisormulti-viewclustering.

Mostoftheseframeworkscanberegroupedundertheumbrellaofcollaborativeclustering,theaimof whichistorevealthecommonunderlyingstructuresfoundbythedifferentalgorithmswhileanalyzing thedata.

Withinthiscontext,thepurposeofthisarticleistoproposeacollaborativeframeworkliftingthelimi- tationsofmanyofthepreviouslyproposedmethods:Ourproposedcollaborativelearningmethodmakes possibleforawiderangeofclusteringalgorithmsfromdifferentfamiliestoworktogetherbasedsolely ontheirclusteringsolutions,thusliftingpreviouslimitationrequiringidenticalprototypesbetweenthe differentcollaborators.OurproposedframeworkusesavariationalEMasitstheoreticalbasisforthecol- laborationprocessandcanbeappliedtoanyofthepreviouslymentionedcollaborativecontexts.

Inthisarticle,wegivethemainideasandtheoreticalfoundationsofourmethod,andwedemonstrate itseffectivenessinaseriesofexperimentsonrealdatasetsaswellasdatasetsfromtheliterature.

© 2017ElsevierLtd.Allrightsreserved.

1. Introduction 1

DataClusteringisafundamentaltaskintheprocessofknowl- 2

edgeextractionfromdatabasesthataimsto discovertheintrinsic 3

structuresinasetofobjectsbyformingclustersthatsharesimilar 4

features.Thistaskismoredifficultthansupervisedclassificationas 5

thenumberofclusterstobefoundisgenerallyunknownandcon- 6

sequentlyitisdifficulttoratethequalityofaclusteringpartition.

7

Overthepasttwodecades,thistaskhasbecomeevenmorechal- 8

lenging whenthe available datasets becamemorecomplex with 9

theintroductionofmulti-viewdatasets,distributeddata,anddata 10

set havingdifferentscales ofstructuresofinterest (e.g.hierarchi- 11

calclusters).Thisincreasedcomplexityinanalreadyhardproblem 12

makes it difficult forlone clusteringalgorithms to give competi- 13

tiveresultswithahighdegreeofconfidence.However,verymuch 14

Corresponding author.

E-mail addresses: [email protected] , [email protected] (J.

Sublime), [email protected] (B. Matei), [email protected] paris13.fr (G. Cabanes), [email protected] (N.

Grozavu), [email protected] (Y. Bennani), [email protected] (A. Cornuéjols).

likein the realworld, such problemscan be tackled moreeasily 15 byhavingseveralalgorithmsworkingtogetherinordertoincrease 16 boththequalityoftheresultsandtheirreliability. 17 Approachesbasedonthisideaofseveralalgorithmsworkingto- 18 getherhavebeenwidelystudiedinthecaseofsupervisedlearning 19 [1–4]where they gave birth to the field of Ensemble Learning. 20 Ensemblemethodsare easytoimplementinsupervisedlearn- 21 ingfortworeasons:First,it isstraightforwardtodefinea combi- 22 nationofpredictivefunctionstogetanaggregatedpredictionfunc- 23 tion(for instance,a linearcombinationisused inboosting).Sec- 24 ond, it is simple to measure both the performance of individual 25 predictionfunctionsand thediversity of theset ofthe functions 26 thatare candidateforbeingpartofthecombinedglobaldecision 27 function.Thingsarenot sostraightforwardinunsupervisedlearn- 28 ing.Here,eachindividualsolutionisasoftorhardpartitionofthe 29 dataset.Howtocombinethesepartitionshasnoobviousanswer. 30 In cooperative clustering, each clustering algorithm produces 31 its result independently. The final clustering is computed in a 32 post-processing step, and the only exchange of information is 33 aboutwhen theindividualprocesses arecompleted,so thatpost- 34 processingcanstart.Inthiscase,asetofclusteringalgorithmsare 35 used inparallel on a givendata set. Onceall local computations 36 http://dx.doi.org/10.1016/j.patcog.2017.07.014

0031-3203/© 2017 Elsevier Ltd. All rights reserved.

Please cite this article as: J. Sublime et al., Entropy based probabilistic collaborative clustering, Pattern Recognition (2017),

(4)

arecompleted,amasteralgorithmtakescontrolandcombinesthe 37

localresultstogetahopefully betteroverall clustering.Thereso- 38

lutionofthepossibleconflictsbetweenthelocalsolutionsrequires 39

analgorithmthatisabletocompareresultsthatmaydifferintheir 40

format(e.g.differentnumbersofclusters,differentdegreesofbe- 41

lief associatedwith theresults, ...)andto find a consensus solu- 42

tionthat minimizes theoverall violationto the localresults.The 43

cooperative framework is closely related to the ensemble meth- 44

odsdeveloped forsupervisedlearning. Intheseapproaches, aset 45

of(diverse)classifiersislearnedandtheclassificationofnewdata 46

pointsisobtainedbytakinga(weighted)voteoftheirpredictions.

47

Bayesianaveragingcanbe consideredasaprecursormethod.Nu- 48

merousnewoneshavebeendeveloped,fromerror-correctingout- 49

putcodingtoBagging, andBoostingandtheirapplicationinvari- 50

ousdomainshavebecomeroutinewithoftengoodresults.

51

Incollaborativeclustering(Thesequelofthispaper),thegroup 52

solvestogetherproblemsdefinedandimposedbythecentralcon- 53

troller,affectinganindividualtasktoeachlearner.Interactionsare 54

recurrentbetweenteammembers,responsibility iscollective,the 55

actionofeachteammateisgearedtotheperformanceofthegroup 56

andviceversa.Bycontrasttothecooperativeclusteringmodel,the 57

collaborativemodeldoesnotseekanoverallhopefullybetterclus- 58

teringof a given data set through the combinationof individual 59

solutions.Inthecollaborativeframework,thegoalisthateachlo- 60

calcomputation, quitepossibly appliedto distinctdata sets,ben- 61

efits fromthe work done by theother collaborators.Thiscan be 62

donethroughtheexchangeofinformationaboutthelocaldata,or 63

thecurrenthypothesizedlocalclustering,orthevalueofonealgo- 64

rithm’s parameters.The validity of theapproach rests onthe as- 65

sumptionthat usefulinformation can be sharedamong the local 66

tasks.Thisschemeleadsnaturallytodistributedimplementations 67

ofthe computations,but unlike in thecooperative framework, it 68

generallyentailsseveraliterationsateachlocalnodebecausecon- 69

vergenceof theconsensus solution requiresseveralpassesofthe 70

algorithm.Indeed,inadditiontotheproblemofwhatinformation 71

toexchange between collaborators,one question ishow to mea- 72

suretheevolutionateachnodeandonagloballevel.

73

Therearemanyapplicationsinunsupervisedlearningforwhich 74

collaborativeclusteringcanproveuseful:

75

Multi-scale analysis: In this case several algorithms would be 76

analyzing the same objects, all looking at the same features, 77

but searchingfora different numberof clusters. Thatkind of 78

analysiscanbebeneficialfordatasetsthathaveintrinsicmulti- 79

scalestructuressuchassatelliteimagesforwhichalowerlevel 80

analysis of globallandscape areas (urbanareas, water bodies, 81

forests)oftenhelpstoimproveahigherlevelanalysisofsmaller 82

details(trees,cars,houses,gardens,streets,etc.).

83

Multi-expert analysis: In this case, all algorithms would be 84

working on the same objects and features of a difficult data 85

set. Given the very high number of existing clustering algo- 86

rithms, all more or lessspecialized andthat mayor maynot 87

give good resultsdependingonthe problem, trying severalof 88

them on a data set and having them exchanging their infor- 89

mationcouldbejustified:mergingtheinformationsonclusters 90

found only bysome clustering algorithms,refining the results 91

basedonclustersthataremoreorlesswellidentifieddepend- 92

ingonthemethod,etc.

93

Multi-view clustering [5,6]: Different algorithms process differ- 94

ent typesof attributesforthe same objects.For exampleone 95

algorithm forgeometricattributes, one fortextattributes, one 96

forcolors,onefornumericalattributes,etc.Thegoalofthecol- 97

laboration in this case would be to have each attribute type 98

processed by a specialized algorithm while givingthese algo- 99

rithmsamoreglobalpicture ofthedatasetby enablingsome 100

exchangesbetweenthem.

101

Clustering of distributed data [7]: The same objects have their 102 attributessplit on several databases that can’t exchange their 103 databecauseofprivacyissues.Whilethenameisdifferent,this 104 isinfactverymuchequivalenttomulti-viewclustering. 105

BigDataClustering[8]:Datasetsthat aretoolargeorhavetoo 106 many attributes to be processed efficiently by a single algo- 107 rithmmaybeeasiertotackleoncetheirattributesaresplitand 108 processedby severalalgorithms.Thistypeofclusteringisuse- 109 fulin thearea ofBig Data analysis andwouldrequire ahigh 110 degreeofcooperationbetweenthealgorithmstogettheglobal 111

picture. 112

Asonecansee,alltheseapplicationshavealotofsimilarities: 113 wehaveseveralalgorithmsworkingonthesamedataorsubsetsof 114 thesamedata,andthat willorcouldatsomepoint trytoaggre- 115 gateortomutuallyexploittheirrespectiveresults.Whilesomeof 116 theseapplicationscouldbeconsideredafieldoftheirownsuchas 117 multi-viewclusteringordistributedclustering[5],allofthemcan 118 beclassifiedashorizontalcollaborativeclusteringframeworks[9– 119 12]:severalalgorithmsworkingonthesamedataeventuallylook- 120 ing foradifferentnumberof clusters,andnot necessarilyhaving 121

accesstothesamefeatures. 122

We generally distinguish between two types of collaborative 123 methods[9,11]:Verticalcollaborationencompassesallcaseswhere 124 severalalgorithmsareworkingondifferentdatathathavesimilar 125 clusters ordistributions. And Horizontal collaboration dealswith 126 caseswhereseveralalgorithmsare collaboratingonthesameob- 127 jects,eventuallydescribedfromdifferentviews.Inthisarticle,we 128 aremostlyinterestedinhorizontalcollaboration. 129 Collaborativemethodsusuallyfollowatwo-stepprocedure[13]: 130 1. Localstep: Eachalgorithm willindividually processthe datait 131 hasaccesstoandproducealocalclusteringpartition. 132 2. Collaborativestep:Thealgorithmssharetheirresultsandtryto 133 confirmorimprovetheirmodelswiththegoalofachievingbet- 134

terclusteringresults. 135

Thesetwostepsaresometimesfollowedbyanaggregationstep 136 whichaimsatreachingaconsensuswiththefinalresultsaftercol- 137 laboration. Inthiswork we willnot addressthe aggregationstep 138 becauseit isafield ofits own,andthatdependingonthe appli- 139 cation it may not always be advisable to aggregate, for instance 140 whenthedifferentviews,sitesorscaleshaveconflictingpartitions 141 [14].Wewillinsteadfocusonthecollaborativestepwheretheal- 142 gorithmsexchange bits ofinformationwitha goalofmutualim- 143

provement. 144

From there,the main difference betweenwhatis traditionally 145 referred as “clustering ensemble learning” [15] and collaborative 146 clustering is that clustering ensemble learning methods aim at 147 finding a single consensus partition, while collaborative cluster- 148 ing doesnot have thisfinal goal.In short,the field of collabora- 149 tiveclusteringisconcernedwithfinding algorithmsandfunctions 150 thatallowalgorithmstoshareinformationandtoimprovetheirre- 151 sultsbasedoneach othersimilarities,whilethefieldofensemble 152 learningismore concernedwithfinding algorithms andmethods 153 tomergethesolutionsorfindaconsensusbetweenthem.Collabo- 154 rativeclusteringcanthereforebeataskofitsown(e.g.multi-view 155 clusteringwhereconsensus is notalways possible noradvisable), 156 ora preliminarystepto an ensemblelearningtask.The methods 157 andtechniquesusedbybothfieldsarethereforenaturallyoverlap- 158 ping, anda good collaborative algorithm mustrespect properties 159 thatareverysimilartotheseofagoodensemblelearningmethod: 160

Robustness:Thecollaborativeprocess mustleadon averageto 161 partitionsthatarebetterthanthelocalclusteringresults. 162

Consistency:The updated resultsmustbe somehowsimilar to 163

theoriginallocalresults. 164

Please cite this article as: J. Sublime et al., Entropy based probabilistic collaborative clustering, Pattern Recognition (2017),

(5)

Novelty:Collaborative clusteringmustmakeitpossibletofind 165

solutionsthatwouldhavebeenotherwiseunattainablelocally.

166

Stability:Resultsthathavealowersensitivitytonoise.

167

Withinthiscontext,inthisarticleweintroduceanewandorig- 168

inalframework forcollaborativeclusteringthatcan be appliedto 169

thevarioustypesofunsupervisedcollaborativelearningtasksthat 170

we havepreviously discussed.Ourproposed methodliftsoff sev- 171

eral limitations of previous ensemble learning and collaborative 172

frameworks: the data need not be shared between the different 173

algorithms,thenumberofclustercanbedifferentbetweentheal- 174

gorithms,andverydifferenttypesofalgorithmscancollaborateto- 175

gether.

176

The theoretical basis of our work is close from the work of 177

BickelandSchefferontheestimationofMixtureModelsusingCo- 178

EM [16,17]. Our proposed method differs from theirs in the fol- 179

lowingpoints:inourcasewearetreating abroadercontextthan 180

multi-viewclustering.Ourmethodmakesitpossibleforalgorithms 181

from different families to work together, and once again we do 182

nothavethelimitationthatallalgorithmsshouldbesearchingfor 183

thesamenumberofclusters. Weproposea variationalversionof 184

their workformulti-viewclusteringbasedontheoptimizationof 185

a differentobjectivefunction.The coreofourproposed approach 186

is adifferent discretizationprocess basedon aparticular classof 187

aposterioridistributionscalled“combinationfunctions” presented 188

inSection3.4.1. 189

Theremainderofthisarticleisorganizedasfollows:

190

In Section2, we propose a state of the art in which we in- 191

troduce some of the pioneer and earlier proposed methods and 192

frameworks for collaborative learning with their strengths and 193

weaknesses.

194

InSection3,weintroduceourproposed methodforhorizontal 195

collaborativeclustering. As statedpreviously,themethodthat we 196

proposeaimsatbeingmoregenericthan thepreviously proposed 197

frameworks.We beginby explainingtheprinciple ofourmethod 198

anditstheoreticalbasis.Thenwestudythestoppingcriterionand 199

parameters tuning of our algorithm. And finally,we demonstrate 200

thatourproposedmethodhasgoodconvergencepropertiessimilar 201

totheseofaEMalgorithm.

202

InSection4,weshowsomeexperimentalresults.Wearemostly 203

interestedinshowingsomepotentialapplicationsofourproposed 204

method applied to multi-scale clustering andmulti-view cluster- 205

206 ing.

Finally, thiswork ends witha conclusion andperspectiveson 207

futureworks.

208

2. Stateoftheartincollaborativeclustering 209

One of the first collaborative clustering algorithm was intro- 210

duced in 2002 by Pedrycz [13,18] under the name “Collaborative 211

FuzzyClustering” (CoFC).Thismethodwasdesignedforthespecific 212

caseof distributed datawhere theinformation cannot be shared 213

betweenthedifferentsites.Thismethodwasbasedonamodified 214

versionoftheFuzzyC-Meansalgorithm[19]. 215

The main limitation of this approach is that it only enables 216

FuzzyC-Meansalgorithmstocollaboratetogether,andfurthermore 217

some methods even require that all of them be looking for the 218

samenumberofclusters.

219

Similar approaches were used to develop several other 220

collaborative-like methods CoEM [17], CoFKM, [20], and another 221

collaborative EM-like algorithm [21] based on Markov Random 222

Fields.

223

All these algorithms display similar limitations: the objective 224

functionsand sometimesthe number ofclustersmust be identi- 225

calforallexchangedinformation.Thisisduetothefactthatthey 226

alltrytooptimizeanobjectivefunctiontheformofwhichis: 227

(

Sopt,

opt

)

=Argmax (S,) Lg

(

S,

)

=Argmax (S,)

J

i=1

L

(

Xi

|

Si,

i

)

j=i

τ

j,i·

(

i,

j

)

(1) whereJisthe numberofcollaborators,S containsallalgorithm’s 228 partitions,their distributionsparameters,Lg(S,)istheglobal 229 likelihoodofthesystem,eachL(Xi|Si,i)isthelocallog-likelihood 230 of a collaborating algorithm, each (i, j) the “collaborative 231 term” is a custom pairwise penalty that compares thedifference 232 betweentheparameters orprototypes oftwo algorithms,andthe 233

τ

j,i which do not exist in all methods are weights given to the 234 collaborative penalties. The definition of the local term L(Xi|Si, 235

i) based on which algorithms collaborate together makes the 236 maindifferencebetweenallthesemethods,whiledefinitionofthe 237 penalty(i,j)onlyslightlydiffersdependingonthecollabora- 238 tivemethod.Thislaterparameteristhelimitingonesincecompar- 239 ing prototypes andparameters requiresthat the algorithms have 240 thesametypesofprototypesandsomekindofmappingbetween 241 theclustersofthedifferentalgorithms. 242 TheworkofPedryczontheCoFCalgorithmwasalsoextended 243 tobeadaptedtotheSelf-OrganizingMaps(SOM)[11,22,23]andto 244 theGenerativeTopographicMaps(GTM)[24]. 245 In [23], the classical SOM objective function is modified by 246 addingaspecificextratermforhorizontalcollaborationandadif- 247 ferentone forvertical collaboration. Forthe collaborativeversion 248 oftheGTMalgorithm[24],theprincipleisthesamewiththeM- 249 StepoftheEMalgorithmmappingtheneuronstothefinalclusters 250

beingmodified. 251

Oneproblem withthesetwo methods isthat they do not re- 252 allysolvethemain issueofcollaborationbetweendifferenttypes 253 ofalgorithmssincetheirmodelinonceagainanalogtotheonein 254 Eq.(1).Furthermore,while thenumberofclustersdoesnotmat- 255 terinthecaseofthecollaborativeSOMandcollaborativeGTM,in 256 bothcasesthemapsmusthavethesamenumberofneuronsand 257 be topologicallysimilar to each other.This isactually even more 258 restrainingthanarequirementonthenumberofclusters. 259 TheSAMARAH method[25,26]isanothertype ofcollaborative 260 frameworkthestrengthofwhichisthatitcandealwithanykind 261 ofhardclusteringalgorithmandisnotconcernedwithissuessuch 262 asfitnessfunctions, numberofclusters, orprototypes.Unlikethe 263 previously introduced method, SAMARAH only handles horizon- 264 talcollaborationdueto thelack ofprototypes,andwasdesigned 265 mostlyforclusteringappliedtoimagedata.Itsgoalisverysimple: 266 givenJclusteringresultsforthesame data,theideaisto modify 267 theseresultsinaniterativeandcollaborativewaywiththeaimof 268 reducingtheir diversityinordertomakethefindingofaconsen- 269

sussolutioneasier. 270

Oncetheresultshavebeengeneratedduringthelocalstep,the 271 SAMARAH method maps the clusters of the different algorithms 272 usingprobabilisticconfusionmatrices(PCM).LetSiandSjbetwo 273 clusteringresultsfromtwoalgorithmsAiandAjlookingforKiand 274

Kjclustersrespectively. 275

Then,the probabilistic confusionmatrix(PCM) i,j that maps 276 theclustersfromAitoAjisdefinedasshownbelow: 277

i,j=

⎜ ⎝

ω

i1,,j1 · · ·

ω

i1,,jKj

..

. ... ...

ω

iK,ij,1 · · ·

ω

Ki,ij,Kj

⎟ ⎠

where

ω

ia,,jb=

|

SiaSbj

|

|

Sia

|

(2)

InEq.(2),Sia denotesthe athcluster ofalgorithm Ai i.e., Sia= 278

{

x;xXi,xabyAi

}

and

|

Sia

|

isthenumberofdatainthisclus- 279

(6)

ter,and

|

SiaSbj

|

isthe number of datalinked to the ath cluster

280

ofAi andthe bth cluster ofAj atthe same time. The PCM i,j

281

makesit possibleto knowwhetherornot theobjects oftwo re- 282

sultshavebeengroupedina similarway,orifthetwoclustering 283

resultsare dissimilar. The matrix hasa key role inthe compari- 284

son of two clusteringresults -such as detecting agreements and 285

conflicts-,andhasthemajoradvantageofbeingindependentfrom 286

theclusteringalgorithmusedtogeneratetheresults.

287

TheSAMARAHmethodusesthismatrixtodetectpairwisecon- 288

flicts between the different partitions and reduces them by or- 289

der of perceived importance based on a conflict metric criterion 290

[25]bysplitting,merging,orremovingclusters.Oncethesolutions 291

haveall beenrefined, andareconsequently quite similar toeach 292

other,it proceedswith aggregatingthem using a process similar 293

toamajorityvote[27].Itisthereforeaverycompleteframework 294

thatcoversall3stepsoflocallearning,collaborativelearningand 295

resultaggregationanddoesnotrelyonusersparameter. 296

However, its conflict resolution system certainly is a weak 297

point:itreliesonapairwiseconflictcriterion,andsolvesthecon- 298

flictsonebyonebyorderofperceivedimportance,anditcanlead 299

tosub-optimalresults.Finally,whileitisalsoastrongpointofthe 300

method,thefact thatthealgorithms parametersorprototypes do 301

notplayanyroleoncethelocalstepisovermayconstituteaweak- 302

ness,inthe sense thatthe localmodelis neverrebuiltusing the 303

newpartitionsanddoesnotplayanyactiveroleineitherthecol- 304

laborativesteportheconsensusstep.

305

3. Horizontalcollaborativeclusteringguidedbydiversity 306

3.1.Formalism 307

Inhorizontalcollaborativeclusteringweconsiderafinitegroup 308

ofalgorithms A=

{

A1,...,AJ

}

thatareworkingonthesamedata

309

elements,albeitpossiblywithaccesstodifferentfeatures,andalso 310

possiblylookingforadifferentnumberofclusters.Noassumptions 311

are madeon the algorithms themselves. LetX=

{

x1,...,xN

}

,xn312

Rd bea data set containing Nelements,each of them withdreal 313

numberfeatures.

314

EachclusteringalgorithmAihasitsownparameterstodescribe 315

eitherthe clustersor its model,and produces its own clustering 316

solutionSimadeof Ki clusters,based onthefeatures ofthe data 317

setXiX ithasaccessto.Inthe caseofhardclustering, Sicanbe 318

translatedintoasolutionvectorofsizeN,andforfuzzyclustering 319

intoamatrixofsizeN×Ki.WedenotethislatermatrixSi=(sin,c), 320

where1≤nNand1≤cKi.ThesolutionsSioutputbythealgo- 321

rithmsarethereforetwo-dimensionalmatricesofsizeN×Kiwhere 322

each element sin,c expresses the responsibility (probability) given 323

byalgorithmAitoaclustercforthedataelementxn. 324

EachalgorithmAicomputesthesolutionsSi,asusualbyintro- 325

ducinga latentdiscrete randomvector Zi definedon somelatent 326

spacewiththerange[1,...,Ki],hence computingthe aposteriori 327

distributionofthevariableZiconditionallyonXiandSi. 328

Finally, inorder toquantify thedegree ofinformation coming 329

fromthecollaboration, fora givenalgorithm Ai, we willassume 330

theexistenceofsome weight

τ

j,i∈(0,1),which measuretherel- 331

ativeexternalinformationfromthealgorithmj=iacceptedbyAi. 332

Allweights

τ

j,i are storedin a square matrixof size J×J which 333

thereforecontainsthestrengthofallcollaborationlinks.Mostno- 334

tationsusedinthisarticlearesummedupinTable1below.

335

3.2.Problemformulation 336

Within the context of horizontal collaboration that we have 337

presented before, the method that we propose takes many ad- 338

vantages of both prototype-based collaborative methods and the 339

SAMARAHmethod,withouttheirissues. 340

OurgoalinthissectionistofindawaytomodifyEq.(1)sothat 341 the collaborativetermwill not depend on theprototypes. There- 342 fore,weproposealikelihoodfunctionbasedonEq.(3)whichuses 343 aglobalconsensustermC(S)basedonthepartitions.Themaindif- 344 ferenceswithEq.(1)arethatweusedamodelbasedonpartitions 345 ratherthanprototypes,ourproposedmodelisconsensusbasedin- 346 steadof divergencebased,andwe usea globalterminsteadofa 347 pairwiseone.Wechosethisglobalmodelbecauseunlikethepair- 348 wiseversion,itdoesnotrequiretoassumethatthealgorithmsare 349 independentfromeachother (whichisofcoursenottrue). 350 In this model,

λ

∈[0, 1] is a weight parameter to bal- 351 ance between the local and collaborative term. The left term 352 J

i=1L(Xi

|

Si,i)iscalledthelocalterm,andtherightterm

λ

·C(S) 353 is the collaborative term. Note that the C(·) here stands for 354

“consensus”: we havea collaborative termbased on aconsensus 355

function. 356

(

Sopt,

opt

)

=Argmax

(S,) Lg

(

S,

)

=Argmax

(S,) J

i=1

L

(

Xi

|

Si,

i

)

+

λ

·C

(

S

)

(3)

Withthismodel,andusingacollaborativetermbasedondiffer- 357 entaposterioridistributionsinsteadofacollaborativetermbased 358 ondistributionsparameters,ourproposedmodelliftsoff thelimi- 359 tationthat onlyidenticalalgorithmslookingforthesamenumber 360 ofclusterscan worktogether.Furthermore,usingourmodeleven 361 non-parametric algorithms-forwhichthedistributions parameter 362

icannotbeexplicitlyformulated-canbeusedinacollaborative 363 setting since our modelis based on the partitions (solution ma- 364 trices or vectors)which are explicit forany clusteringalgorithm. 365 The penalty factor

λ

>0 regularizesthecollaborationpart. Please 366 note that in[28], theauthors have demonstrated that there is a 367 directrelationbetweenreducing thedivergences andmaximizing 368 theconsensus under mildassumptions. Therefore,both strategies 369

areequivalent. 370

Analogously to Eq.(3),our ideais to optimizea modifiedfit- 371 ness of the log-likelihood function that considers both the local 372 partitionsandtheinformationcomingfromtheother algorithms’ 373 solutions.ByconsideringonlythepartitionsSiandnottheparam- 374 eters,verymuchlikeintheSAMARAHmethod[25,26],weensure 375

thatourmodelisbothgeneric. 376

As we will demonstrate in the next subsection, this change 377 fromitoSiismadepossiblebecauseweusean alternatemaxi- 378 mizationprocedureinwhichthepartitionsarecomputedfromthe 379 prototypesandthentheprototypesareupdatedbasedonthepar- 380 titions andthedata.Inshort,thepartitionscan beseenasadis- 381 cretizationofthedistributionsdescribedbytheprototypes. 382 Whilethisimprovementwillresultinamoregenericparadigm 383 whenitcomestohorizontalcollaboration,it isworth mentioning 384 thatremovingtheprototypesalsomakesverticalcollaboration(al- 385 gorithmscollaboratingondifferentdatasetswithsimilarclusters) 386 impossiblewhereassomeoftheearliermethodscoveredthiscase 387 of knowledge transferbetween similar data sets [11,13,24], albeit 388

onlybetweenidenticalalgorithms. 389

To optimize (3) we use the Expectation Maximization (EM) 390 strategy. The workflow in Algorithm (1) highlights how our al- 391 gorithmcanindeedbeconsideredasanEMalgorithm.Duringthe 392 E-Step,thepartitionsSareupdatedusingfixed valuesforthedis- 393 tributions parameters.Then, duringthe M-Step,theseparame- 394 tersareupdatedbasedonthenewpartitions. 395 TheexactformofthefunctionalLgisexplainedinthenextsec- 396 tion,whilethesoppingcriterionisdetailedinSection3.5. 397 Please cite this article as: J. Sublime et al., Entropy based probabilistic collaborative clustering, Pattern Recognition (2017),

(7)

Table 1 Notations.

Notation Development Comment

X i X i= {x i1, . . . , x iN}, x inR d The subset of the data observed by algorithm A i X X = {X 1, . . . , X J} The full data with all views

i The parameters describing the distributions observed by algorithm A i = {1, . . . , J} The set of distributions parameters for all algorithms

A i A i= {X i, S i, i, K i} An algorithm looking for K iclusters of distribution parameters iin the subset X iand finding a partition S i τj,i τj,i[0, 1] The weight of the collaboration from A jto A i

s in,c s in,c(0 , 1), Kc=1i s in,c= 1 The responsibility given by algorithm A ito the cluster c [1.. K i] for the data x in S i S i= (s in,c)Ki×Ki The partition found by algorithm A i. For fuzzy clusters, S iis a matrix.

Z i Z i: [1.. K i] The latent random vector linked to the solutions of algorithm A i P ( Z i| X i, i) the a posteriori distribution of Z iconditionnally to X iand i H See Eq. (16) The global entropy of the collaborative system for all algorithms

ωi,ja,b ωi,a,bj= P(Z nj= b|Z in= a, S , X , ) The percentage of data associated to cluster a by A ithat belong in the cluster b of A j q q = {q 1, · · ·, q J}, i q i[1 ..K i] A combination of clusters (see Section 3.4 )

g i( q , c ) g i( q , c ) (0, 1), c [1.. K i] A consensus function assessing the likelihood of having q i= cknowing the rest of q

Algorithm1:Collaborative“EM”.

Initialize,t=0and(0)withthelocalstep whiletheglobalentropyHdecreasesdo

E-Step:S(t)=ArgmaxSLg(S,(t)), M-Step:(t+1)=ArgmaxLg(S(t),), t=t+1

end ReturnS(t) 3.3. Objectivefunction 398

Thefundamentalquestioninhorizontalcollaborativesettingis 399

tofindtherightfunctionaltooptimizesothatwecanproperlyan- 400

swertheproblemofhavingseveralalgorithmsworkingtogetherby 401

exchanging theirinformationwithagoalofmutualimprovement.

402

Todoso,wehavethefollowingconstraints:Wewantafunctional 403

similar to Eq.(3)based on thepartitions insteadof distributions 404

prototypes,whereweattempttobiaseachlocalsolutionSit sothat 405

Sit+1 takesinto accountthe informationfromtheother partitions 406

without using any prototypes. The problem thereforeconsists in 407

findingtherightlocalandcollaborativeterms.

408

Definingthe localtermisrelatively easy andcanbe done us- 409

inganykindoflikelihoodfunctionforprobabilisticalgorithms,and 410

ad-hoc normalizedqualitycriterion forother typesofalgorithms.

411

The literature is also full of potential divergence and consensus 412

functionsbetweenpartitionsforthecollaborativetermthat mea- 413

surethedivergenceorconsensusbetweentwopartitions(NMI,en- 414

tropies,Rand Index,etc.). However,ifweaddthe extra-constraint 415

that thepartitions aremostlynon-binaryandthat Eq.(3)should 416

beoptimizedinareasonableamountoftime,wefacethefollow- 417

ing problem:Forvector partitionsofsize N,mostoftheseopera- 418

torshaveacomplexityinO(N2).Therefore,thefinalcostofupdat- 419

ingallpartitionsfortheJalgorithmslookingonaverageforK¯clus- 420

ters would be equivalent to call these operators J×N×K¯ times, 421

hence afinal complexityofO(N3) justto optimizethe collabora- 422

tiveterm.

423

Sincesuchcomplexityobviouslydoesnotscalewell,inthere- 424

mainder ofthis section we explain howwe re-designeda likeli- 425

hoodfunctionfromscratchusingasolidprobabilisticmodel.Then, 426

in Section3.4, we show how to optimize thisnew function with 427

a lowcomplexityofO(N). Verymuch likeinEq.(3),weconsider 428

that the functional in the collaborative setting is decoupled into 429

two differentterms,the localtermL(S,) computedfromall lo-

430

callog-likelihoodorqualityindexes,andthecollaborativetermC(S) 431

intheformofaglobalconsensusfunctionbetweenthepartitions.

432

Morepreciselythegloballikelihoodfunctionwrites:

433

Lg

(

S,

)

=L

(

S,

)

+

λ

·C

(

S

)

, (4)

whereXistheobservedvariable,thesetofparametersandS= 434

(S1,...,SJ)isthesetofallpartitions. 435 InthefirsttermLinEq.(4),justasinEq.(3),weexpressthe 436 log-likelihoodofSbasedonlyonthelocalinformationandmodel 437 ofeach algorithmtakenindividually andthedataxn.We evaluate 438 thenthelog-likelihoodofthecompletedsampleagainsttheapos- 439 terioridistributionof(Zi

|

Xni,i). 440 L

(

S,

)

=

J

i=1

N

n=1

P

(

Zni

|

Xni,

i

)

·logP

(

Xni,Zin

|

i

)

. (5)

ThesecondtermofEq.(4)isdetailedinEq.(6).Itiscomputed 441 from the likelihood that each element xn be linked to the right 442 cluster based on the other algorithms’ partitions and the choice 443 ofcluster forthesame datainthe localview.The difference be- 4 4 4 tween thelocal likelihood andthelikelihood based on theother 445 algorithmsgivesusthecollaborativeterm.ThistermC(S)therefore 446 isthelikelihoodofSbasedonallthesolutions. 447

C

(

S

)

= J

i=1

N

n=1

P

(

Zin

|

Xn

\

Xni,S

)

P

(

Zin

|

Xni,

i

)

·logP

(

Xni,Zin

|

i

)

(6) Then using Eqs. (5) and (6) we obtain following a posteriori 448 probability for the completed sample Xni,Zni corresponding to al- 449

gorithmAi: 450

P

(

Zin=c

|

Xni,

i,S

)

=

(

1

λ )

·P

(

Zni =c

|

Xni,

i

)

+

λ

·P

(

Zin=c

|

Xn

\

Xni,S

)

(7)

NotethatduetothelackofindependenceP(Zi

|

Xn

\

Xni,S)isnot 451 tractable.Nevertheless,inthenextsection weshow tractableup- 452

daterulesfortheresponsibilities. 453

3.4.Updaterules 454

Inthis section,we will proceed withthe practical description 455 oftheupdaterulesfortheresponsibilitiessin,c sothatwe canac- 456 tually compute thepartitions that are solutions ofthe functional 457 fromEq. (7). For fuzzyclustering we then infer that the update 458 rulefortheresponsibilityforalldataxnandallclustercfromiter- 459 ationt toiterationt+1duringtheE-stepofAlgorithm(1) isthe 460

following: 461

sin,c

(

t+1

)

=

(

1

λ )

·sin,c

(

t

)

+

λ

·

qQ|qi=c

P

(

q

|

Xn

\

Xni,

t

\

i

(

t

))

·P

(

Zin=qi

|

q

)

(8)

Références

Documents relatifs

After having recalled the bases of model-based clustering, this article will review dimension reduction approaches, regularization-based techniques, parsimonious modeling,

Motivated by the poor performance (linear complexity) of the EM algorithm in clustering large data sets, and inspired by the successful accelerated versions of related algorithms

We design our model such that it is able to detect a certain fixed number of different dynamics in the pop- ulation and, for each of them, to estimate a representa- tive trajectory

Unsupervised Connectionist Clustering Algorithms for a better Supervised Prediction : Application to a radio communication problem Laurent Bougrain, Frédéric Alexandre.. To cite

Due to the functional nature of trajectories, that are ba- sically mappings defined on a time interval, it seems more appropriate to resort to techniques based on times series,

Indeed, Louvain is sys- tematically the best method w.r.t modularity, while having a relatively bad internal clustering coefficient.. On the other hand, Conclude is very good with

A fuzzy clustering model (fcm) with regularization function given by R´ enyi en- tropy is discussed. We explore theoretically the cluster pattern in this mathemat- ical model,

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des