Multiplex networks analysis
source: muxviz
Rushed Kanawati A3, LIPN, CNRS UMR 7030
SPC - University Paris 13
http://lipn.fr/˜kanawati rushed.kanawati@lipn.univ-paris13.fr
WI 2015, Singapore, 6 December 2015 http://lipn.fr/munm
A
CKNOWLEDGMENTSThanks..
Much of the work presented here is made with former and current students : Nesserine Benchettra, Zied Yakoubi, Manel Hmimida, Manisha Pujari, Issam Falih nd Parisa Rastin
O
UTLINES1 Introduction 45 min
Complex network analysis: a brief review
2 Multiplex networks 90 min
Definitions metrics & analysis tasks
3 Applications 60 min
Recommender systems, Cluster ensemble selection
4 Tools 15 min
5 Conclusions & discussion 15 min
C
OMPLEX NETWORKSDefinition
Graphs abstracting, direct or indirect, interactions in real-world systems
Basic topological features
I Low Density I Small Diameter I Scale-free I High Clustering
coefficient
S
CALE-
FREEC
LUSTERING COEFFICIENT DefinitionsI Global version :
3 ×4
∧
I Local on nodev: # of links between neighbors of v
# Potential links between neighbors of v
CC= 3×18
CCv=1= 16
N
OTATIONSA graphG=<V,E⊆V×V>
I Vis the set of nodes (a.k.a. vertices, actors, sites)
I Eis the set of edges (a.k.a. ties, links, bonds)
Notations
I AGAdjacency Matrix d :aij6=0 si les nœuds(vi,vj)∈E, 0 otherwise.
I n=|V|
I m=|E|. We have oftenm∼n
I Γ(v): neighbor’s of nodev.Γ(v) ={x∈V: (x,v)∈E}. I Node degree : d(v) =kΓ(v)k
C
OMPLEX NETWORKS: E
XEMPLEI
Direct interactions
Density : 2−3 Diameter : 28 Clustering coeff.: 0.47
Greatest connected component of Wikipedia
C
OMPLEX NETWORKS: E
XEMPLEII
Indirect interactions
Density : 10−4 Diameter : 24 Clustering coeff.: 0.67
Co-authorship network - DBLP 1980-1984 (for authors active for more than 10 years)
E
XAMPLEIII
Similarity graphs
Public sites accessibility in Bourget district, Paris
Density : 0.052 Diameter : 7 clustering Coef.: 0.87
S
IMILARITY GRAPHS-neighborhood graph: u,vare linked ifd(u,v)≤
k-nearest neighbor graph: each node is connected toknearest nodes.
Relative neighborhood graph:
u,vare linked ifd(u,v)≤maxx{d(v,x),d(u,x)},∀x6=u,v
S
IMILARITY GRAPHSFigure:-threshold graph
C
OMPLEX NETWORKS6= R
ANDOMG
RAPHSErd ¨os-R´enyi model
Rule : fornnodes, generate edge (i,j) independently at random with probabilityp
Low density Small diameter
Clustering coefficient∼p not scale-free
C
OMPLEX NETWORKS6= R
ANDOMG
RAPHS Small-world graphsLow density Small diameter
High clustering coefficient not scale-free
C
OMPLEX NETWORKS6= R
ANDOMG
RAPHSPreferential attachement
Rule : New nodes prefer to connect to high degree nodes
Low density Small diameter
Clustering coefficient→0 Scale-free
C
OMPLEX NETWORKS ANALYSISA lot of interesting results using asimple model: interacting nodes
Node related analysis tasks
Centralities, Roles, Similarities Community detection
Local, disjoint, overlapping, hierarchical Network evolution mining
Link prediction
C
ENTRALITIES Nodes centralitiesDegree centrality example
I Importance and/or influence of a node
I PageRank, HITS : web search results ranking
I FolkRank: Tag recommendation I Betweenness, Closeness: Importance of
actors in a social network I Viral maketing
R
OLES Node Rolesred nodes are star centers, blues are in-clique and others are prepherial
I Nodes having a similar structural behaviors
I Centers of stars, in cliques, peripheral nodes,. . .
I Role query, Role outliers, Role dynamics, Role transfer,. . .
I SeeHenderson et al. KDD 2012
N
ODES SIMILARITYI A node is similar to itself
I Neighborhood-based similarity: Two nodes are similar if they are connected to the same nodes.
I Distance-based similarity: Two nodes are similar if they are connected with short paths.
I Structural-similarity: Two nodes are similar if each is similar to the neighbors of the other.
C
OMMUNITY DETECTIONDefinitions
I A dense subgraph loosely coupled to other modules in the network
I A community is a set of nodes seen as one by nodes outside the community
I A subgraph where almost all nodes are linked to other nodes in the community.
I . . .
C
OMMUNITY DETECTIONApplications
I Finding similar items (documents, users, queries, etc.) I Collaboratif filtering
I Network visualisation I Computation distribution I . . .
L
OCAL COMMUNITYL
OCAL COMMUNITYL
OCAL COMMUNITY: G
ENERIC ALGORITHM1 C← ∅,B← {n0},S←Γ(n0)
2 Q←0 /* a communityquality function*/
3 WhileQcan be enhanced Do 1 n←argmaxn∈SQ
2 S←S− {n}
3 D←D+{n}
4 updateB,S,C 4 ReturnD
Q
UALITY FUNCTIONS: E
XEMPLESI
Local modularityR [Cla05]
R= B Bin
in+Bout
Local modularityM [LWP08]
M= DDin
out
Local modularityL [CZG09]
L= LLin
ex where : Lin=
P
i∈D
kΓ(i)∩Dk
kDk ,Lex=
P
i∈B
kΓ(i)∩Sk kBk
And many many others. . . [YL12]
G
LOBAL COMMUNITIES DETECTIONProblem
I Divid the set of nodes in a number of (overlapping) subsets such that induced subgraphs are dense and loosely coupled.
Recommended readings
I S. Fortunato.Community detection in graphs. Physics Reports, 2010, 486, 75-174
I L. Tang, H. Liu.Community Detection and Mining in Social Media, Morgan Claypool Publishers, 2010
C
OMMUNITIES DETECTION: M
ETHODSClassification
I Group-based
Nodes are grouped in function of shared topological features (ex. clique)
I Network-based
Clustering, Graph-cut, block-models,modularity optimization I Propagation-based
Unstable approaches, good when used with ensemble clustering I Seed-centred
from local community identification to global community detection
G
ROUP-
BASED APPROCHESPrinciple
Search for special (dense) subgraphs:
I k-clique I n-clique
I γ-dense clique
I K-core
G
ROUP-
BASED APPROCHESfrom Symeon Papadopoulos, Community Detection in Social Media, CERTH-ITI, 22 June 2011
N
ETWORK-
BASED APPROCHESClustering approaches
I Apply classical clustering approaches usinggraph-baseddiastase function
I Different types of Graph-based distances: neighborhood-based, path-based (Random-walk)
I Usually requires the number of clusters to discover
M
ODULARITY OPTIMIZATION APPROACHES Modularity: a partition quality criteriaQ(P) = 1 2m
X
c∈P
X
i,j∈c
(Aij−didj
2m) (1)
Figure: Example :Q=0.31
M
ODULARITY OPTIMIZATION APPROACHESI Applying classical optimization algorithms (ex. Genetic algorithms [Piz12]).
I Applying hierarchical clustering and select the level withQmax (ex. Walktrap [PL06])
I Divisive approach : Girvan-Newman algorithm [GN02]
I Greedy optimization : Louvain algorithm [BGL08]
I . . .
E
XAMPLE: L
OUVAIN APPROACHM
ODULARITY OPTIMIZATION LIMITATIONSHypothesis
The best partition of a graph is the one that maximize the modularity.
If a network has a community structure, then it is popsicle to find a precise partition with maximal modularity
If a network has a community structure, then partitions inducing high modularity values are structurally similar.
All three hypothesis do not hold [GdMC10, LF11].
M
ODULARITY RESOLUTION LIMITEM
ODULARITY DISTRIBUTIONP
ROPAGATION-
BASED APPROACHESAlgorithm 1Label propagation
Require: G=<V,E>a connected graph,
1: Initialize each node with unique labellv
2: whileLabels are not stabledo
3: forv∈Vdo
lv =l |Γl(v)|/* random tie-breaking */
4: end for
5: end while
6: return communities from labels
Γl(v): set of neighbors having labell
L
ABEL PROPAGATIONAdvantages
I Complexity :O(m) I Highly parallel
Disadvantages
I No convergence guarantee, oscillation phenomena
I Low robustness
Different runs yields very different community structure due to randomness
S
EED-
CENTRIC ALGORITHMS (KANAWATI, SCSM’2014)Algorithm 2General seed-centric community detection algorithm Require: G=<V,E>a connected graph,
1: C ← ∅
2: S←compute seeds(G)
3: fors∈Sdo
4: Cs← compute local com(s,G)
5: C ← C+Cs 6: end for
7: return compute community(C)
L
INK PREDICTIONLink predction I Structural
Find hidden/missing links in a network ex. Missing links in Wikipedia
I Temporal
Predicting new links to appear at time tpbased on the network state at instants t<tp
Readings
I M. Pujari et. al.,Link prediction in complex networks, chapter 3, in Advanced methods for complex networks Analysis, N.
Meghanthan (Editor), IGI publishing, 2016.
A
LTERNATIVE NETWORK MODELSNetwork science is mature enough to a move towards more complex, expressive models
Multi-relationnal networks
A
LTERNATIVE NETWORK MODELSNetwork science is mature enough to a move towards more complex, expressive models
Multi-relationnal networks K-partite networks
A
LTERNATIVE NETWORK MODELSNetwork science is mature enough to a move towards more complex, expressive models
Multi-relationnal networks K-partite networks
Dynamic networks
A
LTERNATIVE NETWORK MODELSNetwork science is mature enough to a move towards more complex, expressive models
Multi-relational networks K-partite networks Dynamic networks Heterogeneous networks
A
LTERNATIVE NETWORK MODELSNetwork science is mature enough to a move towards more complex, expressive models
Multi-relational networks K-partite networks Dynamic networks Heterogeneous networks Attributed networks
A
LTERNATIVE NETWORK MODELSNetwork science is mature enough to a move towards more complex, expressive models
Multi-relational networks K-partite networks Dynamic networks Heterogeneous networks Attributed networks
A powerful model :Multiplex Network
M
ULTIPLEXN
ETWORK: D
EFINITION G=<V,E,C>from [Mucha et. al., 2010]
I Vset of nodes
I E={E1, . . . ,Eα}:∀k∈[1, α]Ek ⊆V×V I CLayerCouplinglinks
Coupling
I Ordinal Coupling: Diagonal inter-layer links among consecutive layers.
I Categorical Coupling: Diagonal inter-layer links between all pairs of layers.
I Generalized coupling ? Ex. Decay functions
N
OTATIONSNotations
I A[k]Adjacency Matrix of slicek:a[k]ij 6=0 si les nœuds(vi,vj)∈Ek, 0 otherwise.
I m[k]=|Ek|. We have oftenm∼n
I Neighbor’s ofvin slicek:Γ(v)[k]={x∈V: (x,v)∈Ek}. I All neighbors ofv:Γ(v)tot=∪s∈{1,...,α}Γ(v)[s]
I Node degree in slicek:dkv=kΓ(v)[k]k I Total degree of nodev:dtotv =||Γtot(v)||
M
ULTIPLEX NETWORKS: R
ELATED TERMSRecommended readings
/ S. Mikko Kivel¨a et. al..Multilayer Networks. arXiv:1309.7233, March 2014
P
OWER OF MULTIPLEX MODELMulti-relational networks
European airports network
P
OWER OF MULTIPLEX MODEL Dynamic networksAcademic collaborations per year
P
OWER OF MULTIPLEX MODEL Attributed networksTeenage friendship network- Behavioral attributs : Sport practice level, Alcohol, Tobacco & Cannabis consumption
Proximity graphs can be defined overs nodes using attribute-similarity masures
P
OWER OF MULTIPLEX MODELHeterogeneous networks
DBLP author-centred multiplex network
M
ULTIPLEX NETWORKS: M
EASURESNeed of generalization ofusual measures: Degree
Neighbourhood Centralities
Paths and distances Clustering coefficient . . .
New layer-oriented questions to answer : Which layers determine the centrality of a user
Which layers are relevant to measure the similarity of two nodes
How one layer influence the evolution of another
. . .
A
PPROACHES1 Transformation into a monoplex centred problem I Layer aggregation approaches.
I Hypergraph transformation based approaches I Ensemble approaches
2 Generalization of monoplex oriented algorithms to multiplex networks.
Introduction Multiplex networks Applications Tools Conclusion
L
AYER AGGREGATIONAij=
(1 ∃1≤l≤α:A[l]ij 6=0 0 otherwise
Aij=k {d:A[d]ij 6=0} k
Aij= 1 α
α
X
k=1
wkA[k]ij Aij=sim(vi,vj)
L
AYER AGGREGATIONAggregation functions
Aij=
(1 ∃1≤l≤α:A[l]ij 6=0 0 otherwise
Aij=k {d:A[d]ij 6=0} k
Aij= 1 α
α
X
k=1
wkA[k]ij Aij=sim(vi,vj)
K-
UNIFORM HYPERGRAPH TRANSFORMATIONPrinciple
I A k-uniform hypergraph is a hypergraph in which the cardinality of each hyperedge is exactlyk
I Mapping a multiplex to a3-uniform hypergraphH= (V,E)such that :
V =V∪ {1, . . . , α}
(u,v,i)∈ E if∃l:A[l]uv6=0,u,v∈V,i∈ {1, . . . , α}
I Apply hypergraphs analysis approaches (Ex. tensor-based approaches)
M
ULTIPLEX: N
ODE NEIGHBORHOODSome options I Γmux(v) =∪α
k=1Γk(v) I Γmux(v) =∩αk=1Γk(v)
I Γmux(v) ={x∈Γ(v)tot :sim(x,v)≥δ}δ∈[0,1]
I Γmux(v) ={x∈Γ(v)tot : Γ(v)Γ(v)tottot∩Γ(x)∪Γ(x)tottot ≥δ}
I . . .
P
ATHS,
SHORTEST DISTANCESome options
I Path in an aggregated network I daverage=
Pm
α=1d(u,v)[α]
m ∀u,v∈Vand(u,v)∈/ Ei.
I path−length(u,v) =<r1,r2, . . . ,rα>whererinumber of links in layeri
I pathx(u,v)dominatespathy(u,v)∃j:rxj <ryj, ∀k6=j rxj ≤ryj
C
OMMUNITY?
What is a dense subgraph in a multiplex network ?
BerlingerioCG11
C
OMMUNITY DETECTION IN MULTIPLEX NETWORKSApproaches
1 Transformation into a monoplex community detection problem I Layer aggregation approaches.
I Multi-objective optimization approach.
I Ensemble clustering approaches
2 Generalization of monoplex oriented algorithms to multiplex networks.
I Generalized-modularity optimization I Generalized info-map
I Generalized walktrap I Seed-centric approaches
M
ULTI-
OBJECTIVE OPTIMIZATION APPROACH[AP14]
1 Rank the set ofαlayers according to someimportance criteria 2 C1←community(G[1])
3 fori∈[2, α]do:
Ci←optimize(community(G[i]),similarity(Ci−1)) 4 returnCα
Introduction Multiplex networks Applications Tools Conclusion
E
NSEMBLE CLUSTERING APPROACHESI CSPA: Cluster-based Similarity Partitioning Algorithm I HGPA: HyperGraph-Partitioning Algorithm
I MCLA: Meta-Clustering Algorithm I . . .
E
NSEMBLE CLUSTERING APPROACHESEnsemble Clustering Strehl2003
I CSPA: Cluster-based Similarity Partitioning Algorithm I HGPA: HyperGraph-Partitioning Algorithm
I MCLA: Meta-Clustering Algorithm I . . .
E
NSEMBLE CLUSTERING:
APPROACHESCSPA: Cluster-based Similarity Partitioning Algorithm
I LetKbe the number of basic models,Ci(x)be the cluster in model ito whichxbelongs.
I Define a similarity graph on objects :sim(v,u) =
PK i=1
δ(Ci(v),Ci(u)) K
I Cluster the obtained graph :
Isolate connected components after prunning edges Apply community detection approach
I Complexity :O(n2kr):n# objects,k# of clusters,r# of clustering solutions
CSPA : E
XEMPLEfrom Seifi, M. Cœurs stables de communaut´es dans les graphes de terrain. Th`ese de l’universit´e Paris 6, 2012
E
NSEMBLE CLUSTERING:
ILLUSTRATIONI
E
NSEMBLE CLUSTERING:
ILLUSTRATIONII
E
NSEMBLE CLUSTERING:
ILLUSTRATIONIII
E
NSEMBLE CLUSTERING:
ILLUSTRATIONIV
E
NSEMBLE CLUSTERING:
ILLUSTRATIONV
E
NSEMBLE CLUSTERING:
ILLUSTRATIONVI
E
NSEMBLE CLUSTERING:
ILLUSTRATIONVII
M
ULTIPLEXM
ODULARITYGeneralized modularity mucha2010community
I
Qmultiplex(P) = 1 2µ
X
c∈P
X
i,j∈c k,l:1→α
A[k]ij −λkd[k]i d[k]j 2m[k]
δkl+δijCklij
I µ= P
j∈V k,l:1→α
m[k]+Cljk
I Cklij Inter slice coupling = 0∀i6=j
S
EED-
CENTRIC ALGORITHMS[K
AN14]
Algorithm 3General seed-centric community detection algorithm Require: G=<V,E>a connected graph,
1: C ← ∅
2: S←compute seeds(G)
3: fors∈Sdo
4: Cs← compute local com(s,G)
5: C ← C+Cs 6: end for
7: return compute community(C)
T
HEL
ICOD ALGORITHM[YK14]
1 Compute a set of seeds that are likely to be leaders in their communities
Heuristic : nodes having higherdegree centralitiesthan their neighbors 2 Each node in the graph ranks seeds in function of its own
preference
In function of increasingShortest path 3 Iterate till convergence: Each node modifies its preference vector
in function ofneighbor’spreferences
Applyingrank aggregationmethods.
M
UXL
ICODMultiplex degree centrality [BNL13]
dmultiplexi =−
α
X
k=1
d[k]i
d[tot]i log d[k]i d[tot]i
!
Multiplex shortest path
SP(u,v)multiplex=
α
P
k=1
SP(u,v)[k]
α
Multiplex neighborhood
Γmux(v) ={x∈Γ(v)tot: Γ(v)tot∩Γ(x)tot Γ(v)tot∪Γ(x)tot ≥δ}
R
ANK AGGREGATION[PK12, DKNS01]
O
THER ALGORITHMS1 Random walk based approach (Generalization of Walktrap [KM15]
2 Generalized infomap [DLAR15]
E
VALUATION CRITERIAI
1 Multiplex modularity 2 Redundancy [BCG11]
ρ(c) = X
(u,v)∈P¯¯c
k {k:∃A[k]uv6=0} k α× kPck
¯¯
Pthe set of couple(u,v)which are directly connected in at least two layers
3 Complementarity :γ(c) =Vc×εc× Hc
E
VALUATION CRITERIAII
I VarietyVc: the proportion of occurrence of the communityc across layers of the multiplex.
Vc =
α
X
s=1
k∃(i,j)∈c/A[s]ij 6=0k
α−1 (2)
I Exclusivityεc: number of pairs of nodes, in communityc, that are connected exclusively in one layer.
εc=
α
X
s=1
kPc,sk
kPck (3)
E
VALUATION CRITERIAIII
I HomogeneityHc: How uniform is the distribution of the number of edges, in the communityc, per layer.
Hc=
1 if σc=0 1−σσmaxc
c otherwise (4)
with
avgc=
α
X
s=1
kPc,sk α
σc= v u u t
α
X
s=1
(kPc,sk −avgc)2 α
σcmax=
r(max(kPc,d k)−min(kPc,dk))2 2
D
ATASETSBenchmark networks
Lazzega Lawyer network
#nodes 71
#layer 3
D
ATASETSDataset
Physicians collaboration network
#nodes 246
#layers 3
R
ESULTS: R
EDUNDANCYR
ESULTS: C
OMPLEMENTARITYR
ESULTS:
MULTIPLEX MODULARITYP
ARETO FRONTL
AZEGA DATASET: C
OMPARATIVE STUDYFigure: NMI (lower triangular part) , adjusted Rand (upper triangular part).95 / 149
A
PPLICATIONS1 Film recommandation 2 Tag recommendation
3 Collaboration recommendation 4 Ensemble clustering selection
F
ILM RECOMMENDATIONFilm rating matrix = bipartite graph
F
ILM RECOMMENDATIONF
ILM RECOMMENDATIONF
ILM RECOMMENDATION: M
ULTIPLEX NETWORKFigure: Movieslens 100k multiplex
(Projection by users) Figure: Movieslens 100k multiplex (Projection by movies)
F
ILM RECOMMENDATION:
RESULTSSimple approach
Recommend the statistical mode value of links linking clusters of target user to the cluster of target films
MAE RMSE Precision Recall F1-measure GTM 0.9441 1.2549 0.2185 0.2207 0.2195 T. co-clustering 0.9293 1.2562 0.25587 0.2094 0.2303 muxlicod 0.9635 1.2773 0.2274 0.2134 0.2202 LA louvain 0.8352 1.1509 0.3113 0.2521 0.2779 LA walktrap 0.8216 1.1155 0.2642 0.2233 0.2420 PA louvain 0.8713 1.1917 0.2532 0.2032 0.2245 PA walktrap 0.8801 1.2023 0.2705 0.2011 0.2283 Table: Result of the proposed recommendation system with each algorithm in MovieLens 100k dataset (PA : Partition Aggregation, LA : Layer Aggregation
A
PPLICATIONII: TAG
RECOMMENDER(TLTR)
Figure: TLTR model
F
OLKSONOMY GRAPHSFigure: Tripartite graph projection into three bipartite graphs
M
ULTIPLEX NETWORK CONSTRUCTIONFigure: Tag multiplex network: Steps of transformation
E
XPERIMENTS: B
IBSOMNOMY DATASET# Users # Tags # Resources # Edges
116 412 361 24297
Table: Bibsonomy dataset
Networks slices Nodes Edges Density
User User-Resource 116 901 0,135
User-Tag 116 985 0,147
Tag Tag-Resource 412 2496 0,0294
Tag-User 412 1956 0,0231
Resource Resource-Tag 361 2814 0,0433 Resource-User 361 1685 0,0259 Table: Multiplex networks of Bibsonomy
T
AG RECOMMANDATION:
RESULTSGraphs #Nodes #Edges #Users #Tags #Resources
G 889 24297 116 412 361
Gc(Mux-Licod) 434 1677 97 154 183
compression in % 51,18 93,1 16,37 62,62 49,30
Gc(GenLouvain) 16 79 4 6 6
compression in % 98,2 99,67 96,55 98,54 98,33
Gc(LA (Licod)) 91 46 13 40 38
compression in % 89,76 99,81 88,79 90,29 89,47
Gc(LA (Louvain)) 9 27 3 3 3
compression in % 98,98 99,88 97,41 99,27 99,16
Gc(EC (Licod)) 151 993 3 89 59
compression in % 83,08 95,91 97,41 78,39 83,65
Gc(EC (Louvain)) 25 187 8 11 6
compression in % 97,18 78,96 93,10 97,33 98,33 Table: Compression rate of the initial graph
107 / 149
T
AG RECOMMANDATION:
RESULTSFigure: Comparative study of different tags recommendation approaches in terms of precision withkt =1
T
AG RECOMMANDATION:
RESULTSFigure: Comparative study of different tags recommendation approaches in terms of precision withkt =2
T
AG RECOMMANDATION:
RESULTSFigure: Comparative study of different tags recommendation approaches in terms of precision withkt =3
T
AG RECOMMANDATION:
RESULTSFigure: Comparative study of different tags recommendation approaches in terms of precision withkt =4
D
ISCUSSIONI Multiplex approaches outperform layer aggregation and EC approaches on benchmark networks
I Layer aggregation approaches do well for film recommendation ! I EC approaches rank first for Tag recommendation !
I Problem what is the Validity of topological community quality indexes ?
A
PPLICATIONIII: S
CIENTIFICC
OLLABORATION RECOMMENDERL
INK PREDICTION:
SUPERVISED APPROACHE
XPERIMENTS: DBLP
Years Properties Co-Author Co-Venue Co-Citation
1970-1973 Nodes 91 91 91
Edges 116 1256 171
1972-1975 Nodes 221 221 221
Edges 319 5098 706
1974-1977 Nodes 323 323 323
Edges 451 9831 993
Table:Basic statistics about the 3-layer DBLP multiplex networks
Years # Positive # Negatives Train/Test Labeling
1970-1973 1974-1975 16 1810
1972-1975 1976-1977 49 12141
1974-1977 1978-1979 93 26223
Table: #examples extracted from co-authorship layer (number of unconnected nodes in
connected components) 115 / 149
L
INK PREDICTION: R
ESULTSLearning:1970-1973 Learning:1972-1975 Attributes Test:1972-1975 Test:1974-1977
F-measure AUC F-measure AUC
Setdirect 0.0357 0.5263 0.0168 0.4955
Setdirect+indirect 0.0256 0.5372 0.0150 0.5132
Setdirect+multiplex 0.0592 0.5374 0.0122 0.5108
Setall 0.0153 0.5361 0.0171 0.5555
Setmultiplex 0.0374 0.5181 0.0185 0.5485
Table: Comparative link prediction results applying decision tree algorithm using different types of attributs
A
PPLICATIONIV: E
NSEMBLEC
LUSTERINGS
ELECTIONMotivation
The quality of a consensus clustering depends on both thequalityand diversityof input base clusterings [FL08, AF09, NCC13, ADIA15].
Problem definition
I LetΠ ={π1, . . . , πn}be a set of base partitions I ES(Π) = Π∗ ⊂Π :Q(EC(Π∗))>Q(EC(Π)) I Q: Quality of the consensus clustering
D
IVERSITYClustering Similarity measures I Purity
I Rand/ARI
I NMI (Normlized mutual information) I IV (Information variation) [Mei03]
I . . .
Q
UALITYCluster internal quality indexes [AR14]
I Silhouette index,
I Calinski-Harabasz index I Davis-Bouldin index I Dunn index
I . . .
Network-oriented indexes I Modularity
I Average conductance
I Average local Modularities : L, M, R [Kan15]
I See also [YL12]
I . . .
E
NSEMBLE SELECTION APPROACHES:
LIMITATIONS I Existing approaches are defined for attribute/value datasets withmetric distances
I Use of one quality/diversity measure.
I Requires the number of clusters to select as input.
I . . .
Proposed approach: contributions
I Designed for both networks and attribute/value datasets I Use of anensembleof quality/diversity measures.
I The number of selected base clustering is automatically computed.
E
NSEMBLE SELECTION APPROACHThe idea
Cluster the set of base clusterings using an ensemble of similarity measures
Apply amultiplex community detectionalgorithm to a multiplex network whose nodes are the set of base clusterings and whose layers are defined by a set ofproximity graphs, each defined according a to a given similarity measure
From each cluster select the node (i.e clustering) that is ranked first according to an ensemble of quality measures.
Applyensemble rankingalgorithms
E
NSEMBLE SELECTION APPROACHAlgorithm 4Graph-based cluster ensemble selection algorithm Require: Π ={π1, . . . , πr}a set of base clusterings
Require: S={S1, . . . ,Sn}A set of partition similarity functions Require: Q={Q1, . . . ,Qm}A set of partition quality functions
1: Π∗ ← ∅
2: MUX←Multiplex(Π)
3: for allSi ∈ S do
4: MUX.add layer(proximity graph(Π,Si))
5: end for
6: C={c1, . . . ,ck} ←community detection(MUX)
7: for allc∈ C do
8: ˆπ←ensemble Ranking(c,Q)
9: Π∗←Π∗∪ {ˆπ}
10: end for
11: return Π∗
T
HE PROPOSED APPROACHT
HE PROPOSED APPROACHT
HE PROPOSED APPROACHE
NSEMBLE RANKINGProblem
I LetLbe a set of elements to rank bynrankers I Letσibe the rank provided by rankeri
I Goal: Compute a consensus rank ofL.
D´ej`a Vu: Social choice algorithms, but. . .
I Small number of voters and big number of candidates I Algorithmic efficiency is required
Algorithms I Borda
I Kemeny approaches (commuting Condorcet winner if it exists)
E
XPERIMENT ON SMALL NETWORKS WITH KNOWN GROUND TRUTH PARTITIONSI Generation of 20 base clusterings applying a standard Label propagation algorithm
I Proximity graphs : RNG
I S ={NMI, ARI, VI} Q={modularity, Local modularities L, M, R}
Table: Evaluation of the proposed graph-based ensemble selection
Dataset Approach NMI ARI
Zachary Ensemble clustering without selection 0.57 0.46 Ensemble clustering with selection 0.77 0.69 US Politics Ensemble clustering without selection 0.55 0.68 Ensemble clustering with selection 0.68 0.67 Dolphins Ensemble clustering without selection 0.55 0.39
E
XPERIMENTII : DBLP
CO-
AUTHORSHIP NETWORK I Co-authorship network 1970-1977 (GCC) :|V|=643,|m|=886 I Generation of 10, 100 base clusteringsI Proximity graphs : RNG
I S ={NMI, ARI, VI} Q={modularity, Local modularities L, M, R}
Table: Evaluation of the proposed graph-based ensemble selection
# base clusterings 10
Nodes Compression without selection 18,3%
Nodes Compression with selection 20,9%
Edge compression without selection 17,2%
Edge compression with selection 17,6%
Modularity without selection 0.3734 Modularity with selection 0.43756
E
XPERIMENTII : DBLP
CO-
AUTHORSHIP NETWORKTable: Evaluation of the proposed graph-based ensemble selection
# base clusterings 100
Nodes Compression without selection 35,1%
Nodes Compression with selection 40,3%
Edge compression without selection 36,2%
Edge compression with selection 38,3%
Modularity without selection 0.4031 Modularity with selection 0.4665
M
ULTIPLEX ANALYSIS TOOLSI muxviz:http://muxviz.net I Muna:http://lipn.fr/muna I Pymnet
http://people.maths.ox.ac.uk/kivela/mln_library/
M
UXVIZI Rpackage I Main features :
Visualization
Layer compression methods Basic metrics
Community detection : Modularity-based, infomap I Input : text file per layer + one file for the general structure.
M
UNAI Available forRandPython I Built on top ofigraph
I Extended set of multiplex network edition functions (similar to igraph)
I Basic metrics : degree, neighborhood
I Extended set of community detection approaches I Topological community evaluation indexes.
I Limitations :
I No visualisation support
I Simple categorical coupling only.
P
YMNETI Pure Python + integration with networkX package.
I Can handle general multilayer networks
I Rule based generation and lazy-evaluation of coupling edges I Various network analysis methods, transformations, reading and
writing networks, network models etc.
I Visualization support
P
YMNET: V
ISUALISATION EXEMPLEC
ONCLUSIONSI Multiplex networksprovide a rich representation of real-world interaction systems
I A lot of work to reformulate basic network concepts for multiplex settings
ex. Roles, RandomWalk, PageRank, etc.
I New tools for multiplex mining : Muna [FK15], muxviz[?],Pymnet I Community evaluation: still an open problem
I Uncovered topics : Layer selection and compression, Co-evolution models, Dynamics on multiplex networks
I Ideas under exploration:
Multiplex approach for attributed networks mining Multiplex of multiplexes
Interactive Multiplex network visualisation.
Benchmarking available tools
P
YMNET: V
ISUALISATION EXEMPLEO
UR GROUP RELATED BIBLIOGRAPHYI
I Rushed Kanawati, Multiplex Network mining: a brief survey IEEE Intelligent Informatics Bulletin, December 2015.
I Manisha Pujari, Rushed Kanawati, Link prediction in complex networks, chapter 3, in Advanced methods for complex networks Analysis, N. Meghanthan (Editor), IGI publishing.
I Issam Falih, Nistor Grozavu, Rushed Kanawati, Youn`es Bennani, A recommendation system based on unsupervised topological learning, 22nd International Conference on Neural Information Processing (ICONIP2015) 9-12, 2015 Istanbul, Turkey.
I Issam Falih, Rushed Kanawati, Muna: A Multiplex network analysis Library, The 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, 25-28 August 2015, Paris.
I Rushed Kanawati, Ensemble selection for enhancing graph coarsening quality. 5th international workshop on Social Network Analysis (ARS’15), 27-28 April 2015 Capri I Parisa Rastin, Rushed Kanawati, A multiplex-network based approach for clustering
ensemble selection, International workshop on Multiplex network mining (MANEM@ASONAM), 25-28 August 2015, Paris
O
UR GROUP RELATED BIBLIOGRAPHYII
I Manisha Pujari, Rushed Kanawati.Link prediction in multiplex networksAIMS Networks Heterogeneous Media Journal, special issue on multiplex networks, Vol. 10 (1) March 2015 (IF. 0.95)
I Manel Hmimida, Rushed Kanawati,Community detection in multiplex networks: A seed-centric approachAIMS Networks Heterogeneous Media Journal, special issue on multiplex networks, Vol. 10 (1) March 2015
I Rushed Kanawati.Empirical evaluation of applying ensemble methods to ego-centered community identification in complex networks.Neurocomputing, Volume 150, Part B, 20 February 2015, pp. 417-427
I Zied Yakoubi, Rushed Kanawati.Licod: A Leader-driven algorithm for community detection in complex networks.Vietnam Journal of Computer Science - Springer, Volume 1, Issue 4.
November 2014 pp. 241-256
I Rushed Kanawati,Ensemble selection for community detection in complex networks.7th international conference on Social Computing and Social Media, 2-7 August, 2015 Los Angeles
O
UR GROUP RELATED BIBLIOGRAPHYIII
I Issam Falih, Manel Hmimida, Rushed Kanawati,Une approche centr´ee graine pour la d´etection de communaut´es dans les r´eseaux multiplexes. Conf´erence EGC’15, 27-30 Jan. 2015 Luxembourg.
I Rushed Kanawati Multi-objective approach for local community computation. 26th IEEE International conference on tools with artificial Intelligence (ICTAI’2014) 10-12 November 2014 Limassol, Cyprus.
I Rushed Kanawati Community detection in social networks: the power of ensemble methods. ACM/IEEE International conference on data science advanced analytics (DSAA’2014) October 30 - November 1, 2014 Shanghai.
I Rushed Kanawati. YASCA: An ensemble based approach for community detection in complex networks, In proceedings of the 20th International Conference on computing combinatorics (COCOON), LNCS 8691 Atlanta 4-6 August 3014. pp. 657-666
I Rushed Kanawati. Seed-centric approaches for community detection in complex networks 6th international conference on Social Computing and Social Media, 22-27 June 2014, Crete, Greece.
O
UR GROUP RELATED BIBLIOGRAPHYIV
I Manisha Pujari, Rushed Kanawati. Link Prediction in Complex Networks by Supervised Rank Aggregation. ICTAI 2012 : 24th IEEE International Conference on Tools with Artificial Intelligence November 7-9, 2012, Athens, Greece.
I Manisha Pujari, Rushed Kanawati. Tag recommendation by link prediction based on supervised machine learning. Sixth International AAAI Conference on Weblogs and Social Media (ICWSM 2012). 5-8 June 2012, Dublin.
I Manisha Pujari, Rushed Kanawati. Supervised machine learning link prediction approach for tag recommandation. 4th International Conference on Online Communities and Social Computing @ HCI International 2011, 9-14 July 2011, Hilton Orlando Bonnet Creek, Orlando, Florida, USA, LNCS Springer.
I Nesrine Benchettara, Rushed Kanawati, C´eline Rouveirol. A Supervised Machine Learning Link Prediction Approach for Academic Collaboration Recommendation. 4th International ACM Recommender System conference 26-30/9 2010 - Barcalona, spain. pp. 253-256
O
UR GROUP RELATED BIBLIOGRAPHYV
I Nesrine Benchettara, Rushed Kanawati, C´eline Rouveirol. Supervised machine learning applied to link prediction in bipartite social networks. The 2010 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2010). 9-11 August 2010.
Odense Denmark. pp.326-330
That’s all folks !
Questions ?
B
IBLIOGRAPHYI
Ebrahim Akbari, Halina Mohamed Dahlan, Roliana Ibrahim, and Hosein Alizadeh,Hierarchical cluster ensemble selection, Engineering Applications of Artificial Intelligence39(2015), 146–156.
Javad Azimi and Xiaoli Fern,Adaptive cluster ensemble selection, IJCAI (Craig Boutilier, ed.), 2009, pp. 992–997.
Alessia Amelio and Clara Pizzuti,Community detection in multidimensional networks, IEEE 26th International Conference on Tools with Artificial Intelligence, 2014, pp. 352–359.
Charu C. Aggarwal and Chandan K. Reddy (eds.),Data clustering: Algorithms and applications, CRC Press, 2014.
Michele Berlingerio, Michele Coscia, and Fosca Giannotti,Finding and characterizing communities in multidimensional networks, ASONAM, IEEE Computer Society, 2011, pp. 490–494.
Vincent D Blondel, Jean-loup Guillaume, and Etienne Lefebvre,Fast unfolding of communities in large networks, Journal of Statistical Mechanics: Theory and Experiment2008(2008), P10008.
B
IBLIOGRAPHYII
Federico Battiston, Vincenzo Nicosia, and Vito Latora,Metrics for the analysis of multiplex networks, CoRRabs/1308.3182 (2013).
Aaron Clauset,Finding local community structure in networks, Physical Review E (2005).
Jiyang Chen, Osmar R. Za¨ıane, and Randy Goebel,Local community identification in social networks, ASONAM, 2009, pp. 237–242.
Cynthia Dwork, Ravi Kumar, Moni Naor, and D Sivakumar,Rank aggregation methods for the Web, WWW, 2001, pp. 613–622.
Manlio De Domenico, Andrea Lancichinetti, Alex Arenas, and Martin Rosvall,Identifying modular flows on multilayer networks reveals highly overlapping organization in social systems, Phys. Rev5(2015), 011027.
Issam Falih and Rushed Kanawati,Muna: A multiplex network analysis library, The 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (Paris), August 2015, pp. 757–760.
B
IBLIOGRAPHYIII
Xiaoli Z. Fern and Wei Lin,Cluster ensemble selection, Statistical Analysis and Data Mining1(2008), no. 3, 128–141.
B. H. Good, Y.-A. de Montjoye, and A. Clauset.,The performance of modularity maximization in practical contexts., Physical ReviewE(2010), no. 81, 046106.
M. Girvan and M. E. J. Newman,Community structure in social and biological networks, PNAS99(2002), no. 12, 7821–7826.
Rushed Kanawati,Seed-centric approaches for community detection in complex networks, 6th international conference on Social Computing and Social Media (Crete, Greece) (Gabriele Meiselwitz, ed.), vol. LNCS 8531, Springer, June 2014, pp. 197–208.
,Empirical evaluation of applying ensemble methods to ego-centered community identification in complex networks, Neurocomputing150, B(2015), 417–427.
Zhana Kuncheva and Giovanni Montana,Community detection in multiplex networks using locally adaptive random walks, MANEM 2workshop - Proceedings of ASONAM 2015 (Paris), August 2015.
B
IBLIOGRAPHYIV
Andrea Lancichinetti and Santo Fortunato,Limits of modularity maximization in community detection, CoRRabs/1107.1 (2011).
Feng Luo, James Zijun Wang, and Eric Promislow,Exploring local community structures in large networks, Web Intelligence and Agent Systems6(2008), no. 4, 387–400.
Marina Meila,Comparing clusterings by the variation of information, COLT (Bernhard Sch ¨olkopf and Manfred K. Warmuth, eds.), Lecture Notes in Computer Science, vol. 2777, Springer, 2003, pp. 173–187.
Murilo Coelho Naldi, Andr´e C. P. L. F. Carvalho, and Ricardo J. G. B. Campello,Cluster ensemble selection based on relative validity indexes, Data Min. Knowl. Discov.27(2013), no. 2, 259–289.
Clara Pizzuti,A multiobjective genetic algorithm to find communities in complex networks, IEEE Trans. Evolutionary Computation16(2012), no. 3, 418–430.
B
IBLIOGRAPHYV
Manisha Pujari and Rushed Kanawati,Supervised rank aggregation approach for link prediction in complex networks, WWW (Companion Volume) (Alain Mille, Fabien L. Gandon, Jacques Misselis, Michael Rabinovich, and Steffen Staab, eds.), ACM, 2012, pp. 1189–1196.
Pascal Pons and Matthieu Latapy,Computing communities in large networks using random walks, J. Graph Algorithms Appl.
10(2006), no. 2, 191–218.
Zied Yakoubi and Rushed Kanawati,Licod: Leader-driven approaches for community detection, Vietnam Journal of Computer Science1(2014), no. 4, 241–256.
Jaewon Yang and Jure Leskovec,Defining and evaluating network communities based on ground-truth, ICDM (Mohammed Javeed Zaki, Arno Siebes, Jeffrey Xu Yu, Bart Goethals, Geoffrey I. Webb, and Xindong Wu, eds.), IEEE Computer Society, 2012, pp. 745–754.