Analyse des réseaux multiplexes

(1)

Multiplex networks analysis

source: muxviz

Rushed Kanawati A³, LIPN, CNRS UMR 7030

SPC - University Paris 13

http://lipn.fr/˜kanawati rushed.kanawati@lipn.univ-paris13.fr

WI 2015, Singapore, 6 December 2015 http://lipn.fr/munm

(2)

A

CKNOWLEDGMENTS

Thanks..

Much of the work presented here is made with former and current students : Nesserine Benchettra, Zied Yakoubi, Manel Hmimida, Manisha Pujari, Issam Falih nd Parisa Rastin

(3)

O

UTLINES

1 Introduction 45 min

Complex network analysis: a brief review

2 Multiplex networks 90 min

Definitions metrics & analysis tasks

3 Applications ^{60 min}

Recommender systems, Cluster ensemble selection

4 Tools ^{15 min}

5 Conclusions & discussion 15 min

(4)

C

OMPLEX NETWORKS

Definition

Graphs abstracting, direct or indirect, interactions in real-world systems

Basic topological features

I Low Density I Small Diameter I Scale-free I High Clustering

coefficient

(5)

S

CALE

-

FREE

(6)

C

LUSTERING COEFFICIENT Definitions

I Global version :

3 ×4

∧

I Local on nodev: # of links between neighbors of v

# Potential links between neighbors of v

CC= ^3×1₈

CC_v=1= ¹₆

(7)

N

OTATIONS

A graphG=<V,E⊆V×V>

I Vis the set of nodes (a.k.a. vertices, actors, sites)

I Eis the set of edges (a.k.a. ties, links, bonds)

Notations

I A_GAdjacency Matrix d :a_ij6=0 si les nœuds(v_i,v_j)∈E, 0 otherwise.

I n=|V|

I m=|E|. We have oftenm∼n

I Γ(v): neighbor’s of nodev.Γ(v) ={x∈V: (x,v)∈E}. I Node degree : d(v) =kΓ(v)k

(8)

C

OMPLEX NETWORKS

: E

XEMPLE

I

Direct interactions

Density : 2⁻³ Diameter : 28 Clustering coeff.: 0.47

Greatest connected component of Wikipedia

(9)

C

OMPLEX NETWORKS

: E

XEMPLE

II

Indirect interactions

Density : 10⁻⁴ Diameter : 24 Clustering coeff.: 0.67

Co-authorship network - DBLP 1980-1984 (for authors active for more than 10 years)

(10)

E

XAMPLE

III

Similarity graphs

Public sites accessibility in Bourget district, Paris

Density : 0.052 Diameter : 7 clustering Coef.: 0.87

(11)

S

IMILARITY GRAPHS

-neighborhood graph: u,vare linked ifd(u,v)≤

k-nearest neighbor graph: each node is connected toknearest nodes.

Relative neighborhood graph:

u,vare linked ifd(u,v)≤maxx{d(v,x),d(u,x)},∀x6=u,v

(12)

S

IMILARITY GRAPHS

Figure:-threshold graph

(13)

C

OMPLEX NETWORKS

6= R

ANDOM

G

RAPHS

Erd ¨os-R´enyi model

Rule : fornnodes, generate edge (i,j) independently at random with probabilityp

Low density Small diameter

Clustering coefficient∼p not scale-free

(14)

C

OMPLEX NETWORKS

6= R

ANDOM

G

RAPHS Small-world graphs

High clustering coefficient not scale-free

(15)

C

OMPLEX NETWORKS

6= R

ANDOM

G

RAPHS

Preferential attachement

Rule : New nodes prefer to connect to high degree nodes

Clustering coefficient→0 Scale-free

(16)

(17)

C

OMPLEX NETWORKS ANALYSIS

A lot of interesting results using asimple model: interacting nodes

Node related analysis tasks

Centralities, Roles, Similarities Community detection

Local, disjoint, overlapping, hierarchical Network evolution mining

Link prediction

(18)

C

ENTRALITIES Nodes centralities

Degree centrality example

I Importance and/or influence of a node

I PageRank, HITS : web search results ranking

I FolkRank: Tag recommendation I Betweenness, Closeness: Importance of

actors in a social network I Viral maketing

(19)

R

OLES Node Roles

red nodes are star centers, blues are in-clique and others are prepherial

I Nodes having a similar structural behaviors

I Centers of stars, in cliques, peripheral nodes,. . .

I Role query, Role outliers, Role dynamics, Role transfer,. . .

I SeeHenderson et al. KDD 2012

(20)

N

ODES SIMILARITY

I A node is similar to itself

I Neighborhood-based similarity: Two nodes are similar if they are connected to the same nodes.

I Distance-based similarity: Two nodes are similar if they are connected with short paths.

I Structural-similarity: Two nodes are similar if each is similar to the neighbors of the other.

(21)

C

OMMUNITY DETECTION

Definitions

I A dense subgraph loosely coupled to other modules in the network

I A community is a set of nodes seen as one by nodes outside the community

I A subgraph where almost all nodes are linked to other nodes in the community.

I . . .

(22)

C

OMMUNITY DETECTION

Applications

I Finding similar items (documents, users, queries, etc.) I Collaboratif filtering

I Network visualisation I Computation distribution I . . .

(23)

L

OCAL COMMUNITY

(24)

L

OCAL COMMUNITY

(25)

L

OCAL COMMUNITY

: G

ENERIC ALGORITHM

1 C← ∅,B← {n₀},S←Γ(n₀)

2 Q←0 /* a communityquality function*/

3 WhileQcan be enhanced Do 1 n←argmax_n∈SQ

2 S←S− {n}

3 D←D+{n}

4 updateB,S,C 4 ReturnD

(26)

Q

UALITY FUNCTIONS

: E

XEMPLES

I

Local modularityR [Cla05]

R= _B ^Bⁱⁿ

in+Bout

Local modularityM [LWP08]

M= _D^Dⁱⁿ

out

Local modularityL [CZG09]

L= ^L_Lⁱⁿ

ex where : L_in=

P

i∈D

kΓ(i)∩Dk

kDk ,L_ex=

P

i∈B

kΓ(i)∩Sk kBk

And many many others. . . [YL12]

(27)

G

LOBAL COMMUNITIES DETECTION

Problem

I Divid the set of nodes in a number of (overlapping) subsets such that induced subgraphs are dense and loosely coupled.

C

OMMUNITIES DETECTION

: M

ETHODS

Classification

I Group-based

Nodes are grouped in function of shared topological features (ex. clique)

I Network-based

Clustering, Graph-cut, block-models,modularity optimization I Propagation-based

Unstable approaches, good when used with ensemble clustering I Seed-centred

from local community identification to global community detection

(29)

G

ROUP

-

BASED APPROCHES

Principle

Search for special (dense) subgraphs:

I k-clique I n-clique

I γ-dense clique

I K-core

(30)

G

ROUP

-

BASED APPROCHES

from Symeon Papadopoulos, Community Detection in Social Media, CERTH-ITI, 22 June 2011

(31)

N

ETWORK

-

BASED APPROCHES

Clustering approaches

I Apply classical clustering approaches usinggraph-baseddiastase function

I Different types of Graph-based distances: neighborhood-based, path-based (Random-walk)

I Usually requires the number of clusters to discover

(32)

M

ODULARITY OPTIMIZATION APPROACHES Modularity: a partition quality criteria

Q(P) = 1 2m

X

c∈P

X

i,j∈c

(A_ij−d_id_j

2m) (1)

Figure: Example :Q=0.31

(33)

M

ODULARITY OPTIMIZATION APPROACHES

I Applying classical optimization algorithms (ex. Genetic algorithms [Piz12]).

I Applying hierarchical clustering and select the level withQ_max (ex. Walktrap [PL06])

I Divisive approach : Girvan-Newman algorithm [GN02]

I Greedy optimization : Louvain algorithm [BGL08]

I . . .

(34)

E

XAMPLE

: L

OUVAIN APPROACH

(35)

M

ODULARITY OPTIMIZATION LIMITATIONS

Hypothesis

The best partition of a graph is the one that maximize the modularity.

If a network has a community structure, then it is popsicle to find a precise partition with maximal modularity

If a network has a community structure, then partitions inducing high modularity values are structurally similar.

All three hypothesis do not hold [GdMC10, LF11].

(36)

M

ODULARITY RESOLUTION LIMITE

(37)

M

ODULARITY DISTRIBUTION

(38)

P

ROPAGATION

-

BASED APPROACHES

Algorithm 1Label propagation

Require: G=<V,E>a connected graph,

1: Initialize each node with unique labell_v

2: whileLabels are not stabledo

3: forv∈Vdo

l_v =_l |Γ^l(v)|/* random tie-breaking */

4: end for

5: end while

6: return communities from labels

Γ^l(v): set of neighbors having labell

(39)

L

ABEL PROPAGATION

Advantages

I Complexity :O(m) I Highly parallel

Disadvantages

I No convergence guarantee, oscillation phenomena

I Low robustness

Different runs yields very different community structure due to randomness

(40)

S

EED

-

CENTRIC ALGORITHMS (KANAWATI, SCSM’2014)

Algorithm 2General seed-centric community detection algorithm Require: G=<V,E>a connected graph,

1: C ← ∅

2: S←compute seeds(G)

3: fors∈Sdo

4: C_s← compute local com(s,G)

5: C ← C+Cs 6: end for

7: return compute community(C)

(41)

L

INK PREDICTION

Link predction I Structural

Find hidden/missing links in a network ex. Missing links in Wikipedia

I Temporal

Predicting new links to appear at time t_pbased on the network state at instants t<tp

Readings

I M. Pujari et. al.,Link prediction in complex networks, chapter 3, in Advanced methods for complex networks Analysis, N.

Meghanthan (Editor), IGI publishing, 2016.

(42)

(43)

A

LTERNATIVE NETWORK MODELS

Network science is mature enough to a move towards more complex, expressive models

Multi-relationnal networks

(44)

A

Multi-relationnal networks K-partite networks

(45)

A

Multi-relationnal networks K-partite networks

Dynamic networks

(46)

A

Multi-relational networks K-partite networks Dynamic networks Heterogeneous networks

(47)

A

Multi-relational networks K-partite networks Dynamic networks Heterogeneous networks Attributed networks

(48)

A

Multi-relational networks K-partite networks Dynamic networks Heterogeneous networks Attributed networks

A powerful model :Multiplex Network

(49)

M

ULTIPLEX

N

ETWORK

: D

EFINITION G=<V,E,C>

from [Mucha et. al., 2010]

I Vset of nodes

I E={E₁, . . . ,Eα}:∀k∈[1, α]E_k ⊆V×V I CLayerCouplinglinks

Coupling

I Ordinal Coupling: Diagonal inter-layer links among consecutive layers.

I Categorical Coupling: Diagonal inter-layer links between all pairs of layers.

I Generalized coupling ? Ex. Decay functions

(50)

N

OTATIONS

Notations

I A^[k]Adjacency Matrix of slicek:a^[k]_ij 6=0 si les nœuds(v_i,v_j)∈E_k, 0 otherwise.

I m^[k]=|E_k|. We have oftenm∼n

I Neighbor’s ofvin slicek:Γ(v)^[k]={x∈V: (x,v)∈Ek}. I All neighbors ofv:Γ(v)^tot=∪_s∈{1,...,α}Γ(v)^[s]

I Node degree in slicek:d^k_v=kΓ(v)^[k]k I Total degree of nodev:d^tot_v =||Γ^tot(v)||

(51)

M

ULTIPLEX NETWORKS

: R

ELATED TERMS

P

OWER OF MULTIPLEX MODEL

Multi-relational networks

European airports network

(53)

P

OWER OF MULTIPLEX MODEL Dynamic networks

Academic collaborations per year

(54)

P

OWER OF MULTIPLEX MODEL Attributed networks

Teenage friendship network- Behavioral attributs : Sport practice level, Alcohol, Tobacco & Cannabis consumption

Proximity graphs can be defined overs nodes using attribute-similarity masures

(55)

P

OWER OF MULTIPLEX MODEL

Heterogeneous networks

DBLP author-centred multiplex network

(56)

(57)

M

ULTIPLEX NETWORKS

: M

EASURES

Need of generalization ofusual measures: Degree

Neighbourhood Centralities

Paths and distances Clustering coefficient . . .

New layer-oriented questions to answer : Which layers determine the centrality of a user

Which layers are relevant to measure the similarity of two nodes

How one layer influence the evolution of another

. . .

(58)

A

PPROACHES

1 Transformation into a monoplex centred problem I Layer aggregation approaches.

I Hypergraph transformation based approaches I Ensemble approaches

2 Generalization of monoplex oriented algorithms to multiplex networks.

(59)

Introduction Multiplex networks Applications Tools Conclusion

L

AYER AGGREGATION

A_ij=

(1 ∃1≤l≤α:A^[l]_ij 6=0 0 otherwise

Aij=k {d:A^[d]_ij 6=0} k

Aij= 1 α

α

X

k=1

w_kA^[k]_ij Aij=sim(vi,vj)

(60)

L

AYER AGGREGATION

Aggregation functions

A_ij=

(1 ∃1≤l≤α:A^[l]_ij 6=0 0 otherwise

Aij=k {d:A^[d]_ij 6=0} k

Aij= 1 α

α

X

k=1

w_kA^[k]_ij Aij=sim(vi,vj)

(61)

K-

UNIFORM HYPERGRAPH TRANSFORMATION

Principle

I A k-uniform hypergraph is a hypergraph in which the cardinality of each hyperedge is exactlyk

I Mapping a multiplex to a3-uniform hypergraphH= (V,E)such that :

V =V∪ {1, . . . , α}

(u,v,i)∈ E if∃l:A^[l]_uv6=0,u,v∈V,i∈ {1, . . . , α}

I Apply hypergraphs analysis approaches (Ex. tensor-based approaches)

(62)

M

ULTIPLEX

: N

ODE NEIGHBORHOOD

Some options I Γ^mux(v) =∪^α

k=1Γ^k(v) I Γ^mux(v) =∩^α_k=1Γ^k(v)

I Γ^mux(v) ={x∈Γ(v)^tot :sim(x,v)≥δ}δ∈[0,1]

I Γ^mux(v) ={x∈Γ(v)^tot : ^Γ(v)_Γ(v)^tot_tot^∩Γ(x)_∪Γ(x)^tot_tot ≥δ}

I . . .

(63)

P

ATHS

,

SHORTEST DISTANCE

Some options

I Path in an aggregated network I d_average=

P_m

α=1d(u,v)^[α]

m ∀u,v∈Vand(u,v)∈/ E_i.

I path−length(u,v) =<r₁,r2, . . . ,rα>wherer_inumber of links in layeri

I pathx(u,v)dominatespathy(u,v)∃j:r^x_j <r^y_j, ∀k6=j r^x_j ≤r^y_j

(64)

(65)

C

OMMUNITY

?

What is a dense subgraph in a multiplex network ?

BerlingerioCG11

(66)

C

OMMUNITY DETECTION IN MULTIPLEX NETWORKS

Approaches

1 Transformation into a monoplex community detection problem I Layer aggregation approaches.

I Multi-objective optimization approach.

I Ensemble clustering approaches

2 Generalization of monoplex oriented algorithms to multiplex networks.

I Generalized-modularity optimization I Generalized info-map

I Generalized walktrap I Seed-centric approaches

(67)

M

ULTI

-

OBJECTIVE OPTIMIZATION APPROACH

[AP14]

1 Rank the set ofαlayers according to someimportance criteria 2 C₁←community(G^[1])

3 fori∈[2, α]do:

C_i←optimize(community(G^[i]),similarity(C_i−1)) 4 returnCα

(68)

Introduction Multiplex networks Applications Tools Conclusion

E

NSEMBLE CLUSTERING APPROACHES

I CSPA: Cluster-based Similarity Partitioning Algorithm I HGPA: HyperGraph-Partitioning Algorithm

I MCLA: Meta-Clustering Algorithm I . . .

(69)

E

NSEMBLE CLUSTERING APPROACHES

Ensemble Clustering Strehl2003

I CSPA: Cluster-based Similarity Partitioning Algorithm I HGPA: HyperGraph-Partitioning Algorithm

I MCLA: Meta-Clustering Algorithm I . . .

(70)

E

NSEMBLE CLUSTERING

:

APPROACHES

CSPA: Cluster-based Similarity Partitioning Algorithm

I LetKbe the number of basic models,C_i(x)be the cluster in model ito whichxbelongs.

I Define a similarity graph on objects :sim(v,u) =

PK i=1

δ(Ci(v),Ci(u)) K

I Cluster the obtained graph :

Isolate connected components after prunning edges Apply community detection approach

I Complexity :O(n²kr):n# objects,k# of clusters,r# of clustering solutions

(71)

CSPA : E

XEMPLE

from Seifi, M. Cœurs stables de communautés dans les graphes de terrain. Thèse de l’université Paris 6, 2012

(72)

E

NSEMBLE CLUSTERING

:

ILLUSTRATION

I

(73)

E

NSEMBLE CLUSTERING

:

ILLUSTRATION

II

(74)

E

NSEMBLE CLUSTERING

:

ILLUSTRATION

III

(75)

E

NSEMBLE CLUSTERING

:

ILLUSTRATION

IV

(76)

E

NSEMBLE CLUSTERING

:

ILLUSTRATION

V

(77)

E

NSEMBLE CLUSTERING

:

ILLUSTRATION

VI

(78)

E

NSEMBLE CLUSTERING

:

ILLUSTRATION

VII

(79)

(80)

M

ULTIPLEX

M

ODULARITY

Generalized modularity mucha2010community

I

Q_multiplex(P) = 1 2µ

X

c∈P

X

i,j∈c k,l:1→α







A^[k]_ij −λ_kd^[k]_i d^[k]_j 2m^[k]



δ_kl+δ_ijC^kl_ij





I µ= P

j∈V k,l:1→α

m^[k]+C^l_jk

I C^kl_ij Inter slice coupling = 0∀i6=j

(81)

S

EED

-

CENTRIC ALGORITHMS

[K

AN

14]

Algorithm 3General seed-centric community detection algorithm Require: G=<V,E>a connected graph,

1: C ← ∅

2: S←compute seeds(G)

3: fors∈Sdo

4: C_s← compute local com(s,G)

5: C ← C+Cs 6: end for

7: return compute community(C)

(82)

T

HE

L

ICOD ALGORITHM

[YK14]

1 Compute a set of seeds that are likely to be leaders in their communities

Heuristic : nodes having higherdegree centralitiesthan their neighbors 2 Each node in the graph ranks seeds in function of its own

preference

In function of increasingShortest path 3 Iterate till convergence: Each node modifies its preference vector

in function ofneighbor’spreferences

Applyingrank aggregationmethods.

(83)

M

UX

L

ICOD

Multiplex degree centrality [BNL13]

d^multiplex_i =−

α

X

k=1

d^[k]_i

d^[tot]_i log d^[k]_i d^[tot]_i

!

Multiplex shortest path

SP(u,v)^multiplex=

α

P

k=1

SP(u,v)^[k]

α

Multiplex neighborhood

Γ^mux(v) ={x∈Γ(v)^tot: Γ(v)^tot∩Γ(x)^tot Γ(v)^tot∪Γ(x)^tot ≥δ}

(84)

R

ANK AGGREGATION

[PK12, DKNS01]

(85)

O

THER ALGORITHMS

1 Random walk based approach (Generalization of Walktrap [KM15]

2 Generalized infomap [DLAR15]

(86)

E

VALUATION CRITERIA

I

1 Multiplex modularity 2 Redundancy [BCG11]

ρ(c) = X

(u,v)∈P¯¯c

k {k:∃A^[k]_uv6=0} k α× kP_ck

¯¯

Pthe set of couple(u,v)which are directly connected in at least two layers

3 Complementarity :γ(c) =V_c×ε_c× H_c

(87)

E

VALUATION CRITERIA

II

I VarietyV_c: the proportion of occurrence of the communityc across layers of the multiplex.

V_c =

α

X

s=1

k∃(i,j)∈c/A^[s]_ij 6=0k

α−1 (2)

I Exclusivityε_c: number of pairs of nodes, in communityc, that are connected exclusively in one layer.

ε_c=

α

X

s=1

kP_c,sk

kPck (3)

(88)

E

VALUATION CRITERIA

III

I HomogeneityH_c: How uniform is the distribution of the number of edges, in the communityc, per layer.

H_c=

1 if σ_c=0 1−_σ^σ_max^c

c otherwise (4)

with

avg_c=

α

X

s=1

kPc,sk α

σ_c= v u u t

α

X

s=1

(kP_c,sk −avg_c)² α

σ_c^max=

r(max(kP_c,d k)−min(kP_c,dk))² 2

(89)

D

ATASETS

Benchmark networks

Lazzega Lawyer network

#nodes 71

#layer 3

(90)

D

ATASETS

Dataset

Physicians collaboration network

#nodes 246

#layers 3

(91)

R

ESULTS

: R

EDUNDANCY

(92)

R

ESULTS

: C

OMPLEMENTARITY

(93)

R

ESULTS

:

MULTIPLEX MODULARITY

(94)

P

ARETO FRONT

(95)

L

AZEGA DATASET

: C

OMPARATIVE STUDY

Figure: NMI (lower triangular part) , adjusted Rand (upper triangular part).95 / 149

(96)

(97)

A

PPLICATIONS

1 Film recommandation 2 Tag recommendation

3 Collaboration recommendation 4 Ensemble clustering selection

(98)

F

ILM RECOMMENDATION

Film rating matrix = bipartite graph

(99)

F

ILM RECOMMENDATION

(100)

F

ILM RECOMMENDATION

(101)

F

ILM RECOMMENDATION

: M

ULTIPLEX NETWORK

Figure: Movieslens 100k multiplex

(Projection by users) Figure: Movieslens 100k multiplex (Projection by movies)

(102)

F

ILM RECOMMENDATION

:

RESULTS

Simple approach

Recommend the statistical mode value of links linking clusters of target user to the cluster of target films

MAE RMSE Precision Recall F1-measure GTM 0.9441 1.2549 0.2185 0.2207 0.2195 T. co-clustering 0.9293 1.2562 0.25587 0.2094 0.2303 muxlicod 0.9635 1.2773 0.2274 0.2134 0.2202 LA louvain 0.8352 1.1509 0.3113 0.2521 0.2779 LA walktrap 0.8216 1.1155 0.2642 0.2233 0.2420 PA louvain 0.8713 1.1917 0.2532 0.2032 0.2245 PA walktrap 0.8801 1.2023 0.2705 0.2011 0.2283 Table: Result of the proposed recommendation system with each algorithm in MovieLens 100k dataset (PA : Partition Aggregation, LA : Layer Aggregation

(103)

A

PPLICATION

II: TAG

RECOMMENDER

(TLTR)

Figure: TLTR model

(104)

F

OLKSONOMY GRAPHS

Figure: Tripartite graph projection into three bipartite graphs

(105)

M

ULTIPLEX NETWORK CONSTRUCTION

Figure: Tag multiplex network: Steps of transformation

(106)

E

XPERIMENTS

: B

IBSOMNOMY DATASET

# Users # Tags # Resources # Edges

116 412 361 24297

Table: Bibsonomy dataset

Networks slices Nodes Edges Density

User User-Resource 116 901 0,135

User-Tag 116 985 0,147

Tag Tag-Resource 412 2496 0,0294

Tag-User 412 1956 0,0231

Resource Resource-Tag 361 2814 0,0433 Resource-User 361 1685 0,0259 Table: Multiplex networks of Bibsonomy

(107)

T

AG RECOMMANDATION

:

RESULTS

Graphs #Nodes #Edges #Users #Tags #Resources

G 889 24297 116 412 361

Gc(Mux-Licod) 434 1677 97 154 183

compression in % 51,18 93,1 16,37 62,62 49,30

Gc(GenLouvain) 16 79 4 6 6

compression in % 98,2 99,67 96,55 98,54 98,33

G_c(LA (Licod)) 91 46 13 40 38

compression in % 89,76 99,81 88,79 90,29 89,47

G_c(LA (Louvain)) 9 27 3 3 3

compression in % 98,98 99,88 97,41 99,27 99,16

G_c(EC (Licod)) 151 993 3 89 59

compression in % 83,08 95,91 97,41 78,39 83,65

Gc(EC (Louvain)) 25 187 8 11 6

compression in % 97,18 78,96 93,10 97,33 98,33 Table: Compression rate of the initial graph

107 / 149

(108)

T

AG RECOMMANDATION

:

RESULTS

Figure: Comparative study of different tags recommendation approaches in terms of precision withkt =1

(109)

T

AG RECOMMANDATION

:

RESULTS

(110)

T

AG RECOMMANDATION

:

RESULTS

(111)

T

AG RECOMMANDATION

:

RESULTS

(112)

D

ISCUSSION

I Multiplex approaches outperform layer aggregation and EC approaches on benchmark networks

I Layer aggregation approaches do well for film recommendation ! I EC approaches rank first for Tag recommendation !

I Problem what is the Validity of topological community quality indexes ?

(113)

A

PPLICATION

III: S

CIENTIFIC

C

OLLABORATION RECOMMENDER

(114)

L

INK PREDICTION

:

SUPERVISED APPROACH

(115)

E

XPERIMENTS

: DBLP

Years Properties Co-Author Co-Venue Co-Citation

1970-1973 Nodes 91 91 91

Edges 116 1256 171

1972-1975 Nodes 221 221 221

Edges 319 5098 706

1974-1977 Nodes 323 323 323

Edges 451 9831 993

Table:Basic statistics about the 3-layer DBLP multiplex networks

Years # Positive # Negatives Train/Test Labeling

1970-1973 1974-1975 16 1810

1972-1975 1976-1977 49 12141

1974-1977 1978-1979 93 26223

Table: #examples extracted from co-authorship layer (number of unconnected nodes in

connected components) ^{115 / 149}

(116)

L

INK PREDICTION

: R

ESULTS

Learning:1970-1973 Learning:1972-1975 Attributes Test:1972-1975 Test:1974-1977

F-measure AUC F-measure AUC

Set_direct 0.0357 0.5263 0.0168 0.4955

Setdirect+indirect 0.0256 0.5372 0.0150 0.5132

Setdirect+multiplex 0.0592 0.5374 0.0122 0.5108

Set_all 0.0153 0.5361 0.0171 0.5555

Set_multiplex 0.0374 0.5181 0.0185 0.5485

Table: Comparative link prediction results applying decision tree algorithm using different types of attributs

(117)

(118)

A

PPLICATION

IV: E

NSEMBLE

C

LUSTERING

S

ELECTION

Motivation

The quality of a consensus clustering depends on both thequalityand diversityof input base clusterings [FL08, AF09, NCC13, ADIA15].

Problem definition

I LetΠ ={π₁, . . . , π_n}be a set of base partitions I ES(Π) = Π^∗ ⊂Π :Q(EC(Π^∗))>Q(EC(Π)) I Q: Quality of the consensus clustering

(119)

D

IVERSITY

Clustering Similarity measures I Purity

I Rand/ARI

I NMI (Normlized mutual information) I IV (Information variation) [Mei03]

I . . .

(120)

Q

UALITY

Cluster internal quality indexes [AR14]

I Silhouette index,

I Calinski-Harabasz index I Davis-Bouldin index I Dunn index

I . . .

Network-oriented indexes I Modularity

I Average conductance

I Average local Modularities : L, M, R [Kan15]

I See also [YL12]

I . . .

(121)

E

NSEMBLE SELECTION APPROACHES

:

LIMITATIONS I Existing approaches are defined for attribute/value datasets with

metric distances

I Use of one quality/diversity measure.

I Requires the number of clusters to select as input.

I . . .

Proposed approach: contributions

I Designed for both networks and attribute/value datasets I Use of anensembleof quality/diversity measures.

I The number of selected base clustering is automatically computed.

(122)

E

NSEMBLE SELECTION APPROACH

The idea

Cluster the set of base clusterings using an ensemble of similarity measures

Apply amultiplex community detectionalgorithm to a multiplex network whose nodes are the set of base clusterings and whose layers are defined by a set ofproximity graphs, each defined according a to a given similarity measure

From each cluster select the node (i.e clustering) that is ranked first according to an ensemble of quality measures.

Applyensemble rankingalgorithms

(123)

E

NSEMBLE SELECTION APPROACH

Algorithm 4Graph-based cluster ensemble selection algorithm Require: Π ={π₁, . . . , π_r}a set of base clusterings

Require: S={S₁, . . . ,S_n}A set of partition similarity functions Require: Q={Q₁, . . . ,Qm}A set of partition quality functions

1: Π^∗ ← ∅

2: MUX←Multiplex(Π)

3: for allS_i ∈ S do

4: MUX.add layer(proximity graph(Π,S_i))

5: end for

6: C={c₁, . . . ,c_k} ←community detection(MUX)

7: for allc∈ C do

8: ˆπ←ensemble Ranking(c,Q)

9: Π^∗←Π^∗∪ {ˆπ}

10: end for

11: return Π^∗

(124)

T

HE PROPOSED APPROACH

(125)

T

(126)

T

(127)

E

NSEMBLE RANKING

Problem

I LetLbe a set of elements to rank bynrankers I Letσ_ibe the rank provided by rankeri

I Goal: Compute a consensus rank ofL.

D´ej`a Vu: Social choice algorithms, but. . .

I Small number of voters and big number of candidates I Algorithmic efficiency is required

Algorithms I Borda

I Kemeny approaches (commuting Condorcet winner if it exists)

(128)

E

XPERIMENT ON SMALL NETWORKS WITH KNOWN GROUND TRUTH PARTITIONS

I Generation of 20 base clusterings applying a standard Label propagation algorithm

I Proximity graphs : RNG

I S ={NMI, ARI, VI} Q={modularity, Local modularities L, M, R}

Table: Evaluation of the proposed graph-based ensemble selection

Dataset Approach NMI ARI

Zachary Ensemble clustering without selection 0.57 0.46 Ensemble clustering with selection 0.77 0.69 US Politics Ensemble clustering without selection 0.55 0.68 Ensemble clustering with selection 0.68 0.67 Dolphins Ensemble clustering without selection 0.55 0.39

(129)

E

XPERIMENT

II : DBLP

CO

-

AUTHORSHIP NETWORK I Co-authorship network 1970-1977 (GCC) :|V|=643,|m|=886 I Generation of 10, 100 base clusterings

I Proximity graphs : RNG

I S ={NMI, ARI, VI} Q={modularity, Local modularities L, M, R}

# base clusterings 10

Nodes Compression without selection 18,3%

Nodes Compression with selection 20,9%

Edge compression without selection 17,2%

Edge compression with selection 17,6%

Modularity without selection 0.3734 Modularity with selection 0.43756

(130)

E

XPERIMENT

II : DBLP

CO

-

AUTHORSHIP NETWORK

# base clusterings 100

Nodes Compression without selection 35,1%

Nodes Compression with selection 40,3%

Edge compression without selection 36,2%

Edge compression with selection 38,3%

Modularity without selection 0.4031 Modularity with selection 0.4665

(131)

(132)

M

ULTIPLEX ANALYSIS TOOLS

I muxviz:http://muxviz.net I Muna:http://lipn.fr/muna I Pymnet

http://people.maths.ox.ac.uk/kivela/mln_library/

(133)

M

UXVIZ

I Rpackage I Main features :

Visualization

Layer compression methods Basic metrics

Community detection : Modularity-based, infomap I Input : text file per layer + one file for the general structure.

(134)

M

UNA

I Available forRandPython I Built on top ofigraph

I Extended set of multiplex network edition functions (similar to igraph)

I Basic metrics : degree, neighborhood

I Extended set of community detection approaches I Topological community evaluation indexes.

I Limitations :

I No visualisation support

I Simple categorical coupling only.

(135)

P

YMNET

I Pure Python + integration with networkX package.

I Can handle general multilayer networks

I Rule based generation and lazy-evaluation of coupling edges I Various network analysis methods, transformations, reading and

writing networks, network models etc.

I Visualization support

(136)

P

YMNET

: V

ISUALISATION EXEMPLE

(137)

C

ONCLUSIONS

I Multiplex networksprovide a rich representation of real-world interaction systems

I A lot of work to reformulate basic network concepts for multiplex settings

ex. Roles, RandomWalk, PageRank, etc.

I New tools for multiplex mining : Muna [FK15], muxviz[?],Pymnet I Community evaluation: still an open problem

I Uncovered topics : Layer selection and compression, Co-evolution models, Dynamics on multiplex networks

I Ideas under exploration:

Multiplex approach for attributed networks mining Multiplex of multiplexes

Interactive Multiplex network visualisation.

Benchmarking available tools

(138)

P

YMNET

: V

ISUALISATION EXEMPLE

(139)

O

UR GROUP RELATED BIBLIOGRAPHY

I

I Rushed Kanawati, Multiplex Network mining: a brief survey IEEE Intelligent Informatics Bulletin, December 2015.

I Manisha Pujari, Rushed Kanawati, Link prediction in complex networks, chapter 3, in Advanced methods for complex networks Analysis, N. Meghanthan (Editor), IGI publishing.

I Issam Falih, Nistor Grozavu, Rushed Kanawati, Youn`es Bennani, A recommendation system based on unsupervised topological learning, 22nd International Conference on Neural Information Processing (ICONIP2015) 9-12, 2015 Istanbul, Turkey.

I Issam Falih, Rushed Kanawati, Muna: A Multiplex network analysis Library, The 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, 25-28 August 2015, Paris.

I Rushed Kanawati, Ensemble selection for enhancing graph coarsening quality. 5th international workshop on Social Network Analysis (ARS’15), 27-28 April 2015 Capri I Parisa Rastin, Rushed Kanawati, A multiplex-network based approach for clustering

ensemble selection, International workshop on Multiplex network mining (MANEM@ASONAM), 25-28 August 2015, Paris

(140)

O

II

I Manisha Pujari, Rushed Kanawati.Link prediction in multiplex networksAIMS Networks Heterogeneous Media Journal, special issue on multiplex networks, Vol. 10 (1) March 2015 (IF. 0.95)

I Manel Hmimida, Rushed Kanawati,Community detection in multiplex networks: A seed-centric approachAIMS Networks Heterogeneous Media Journal, special issue on multiplex networks, Vol. 10 (1) March 2015

I Rushed Kanawati.Empirical evaluation of applying ensemble methods to ego-centered community identification in complex networks.Neurocomputing, Volume 150, Part B, 20 February 2015, pp. 417-427

I Zied Yakoubi, Rushed Kanawati.Licod: A Leader-driven algorithm for community detection in complex networks.Vietnam Journal of Computer Science - Springer, Volume 1, Issue 4.

November 2014 pp. 241-256

I Rushed Kanawati,Ensemble selection for community detection in complex networks.7th international conference on Social Computing and Social Media, 2-7 August, 2015 Los Angeles

(141)

O

III

I Issam Falih, Manel Hmimida, Rushed Kanawati,Une approche centrée graine pour la détection de communautés dans les réseaux multiplexes. Conférence EGC’15, 27-30 Jan. 2015 Luxembourg.

I Rushed Kanawati Multi-objective approach for local community computation. 26th IEEE International conference on tools with artificial Intelligence (ICTAI’2014) 10-12 November 2014 Limassol, Cyprus.

I Rushed Kanawati Community detection in social networks: the power of ensemble methods. ACM/IEEE International conference on data science advanced analytics (DSAA’2014) October 30 - November 1, 2014 Shanghai.

I Rushed Kanawati. YASCA: An ensemble based approach for community detection in complex networks, In proceedings of the 20th International Conference on computing combinatorics (COCOON), LNCS 8691 Atlanta 4-6 August 3014. pp. 657-666

I Rushed Kanawati. Seed-centric approaches for community detection in complex networks 6th international conference on Social Computing and Social Media, 22-27 June 2014, Crete, Greece.

(142)

O

IV

I Manisha Pujari, Rushed Kanawati. Link Prediction in Complex Networks by Supervised Rank Aggregation. ICTAI 2012 : 24th IEEE International Conference on Tools with Artificial Intelligence November 7-9, 2012, Athens, Greece.

I Manisha Pujari, Rushed Kanawati. Tag recommendation by link prediction based on supervised machine learning. Sixth International AAAI Conference on Weblogs and Social Media (ICWSM 2012). 5-8 June 2012, Dublin.

I Manisha Pujari, Rushed Kanawati. Supervised machine learning link prediction approach for tag recommandation. 4th International Conference on Online Communities and Social Computing @ HCI International 2011, 9-14 July 2011, Hilton Orlando Bonnet Creek, Orlando, Florida, USA, LNCS Springer.

I Nesrine Benchettara, Rushed Kanawati, C´eline Rouveirol. A Supervised Machine Learning Link Prediction Approach for Academic Collaboration Recommendation. 4th International ACM Recommender System conference 26-30/9 2010 - Barcalona, spain. pp. 253-256

(143)

O

V

I Nesrine Benchettara, Rushed Kanawati, C´eline Rouveirol. Supervised machine learning applied to link prediction in bipartite social networks. The 2010 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2010). 9-11 August 2010.

Odense Denmark. pp.326-330

(144)

That’s all folks !

Questions ?

(145)

B

IBLIOGRAPHY

I

Ebrahim Akbari, Halina Mohamed Dahlan, Roliana Ibrahim, and Hosein Alizadeh,Hierarchical cluster ensemble selection, Engineering Applications of Artificial Intelligence39(2015), 146–156.

Javad Azimi and Xiaoli Fern,Adaptive cluster ensemble selection, IJCAI (Craig Boutilier, ed.), 2009, pp. 992–997.

Alessia Amelio and Clara Pizzuti,Community detection in multidimensional networks, IEEE 26th International Conference on Tools with Artificial Intelligence, 2014, pp. 352–359.

Charu C. Aggarwal and Chandan K. Reddy (eds.),Data clustering: Algorithms and applications, CRC Press, 2014.

Michele Berlingerio, Michele Coscia, and Fosca Giannotti,Finding and characterizing communities in multidimensional networks, ASONAM, IEEE Computer Society, 2011, pp. 490–494.

Vincent D Blondel, Jean-loup Guillaume, and Etienne Lefebvre,Fast unfolding of communities in large networks, Journal of Statistical Mechanics: Theory and Experiment2008(2008), P10008.

(146)

B

IBLIOGRAPHY

II

Federico Battiston, Vincenzo Nicosia, and Vito Latora,Metrics for the analysis of multiplex networks, CoRRabs/1308.3182 (2013).

Aaron Clauset,Finding local community structure in networks, Physical Review E (2005).

Jiyang Chen, Osmar R. Za¨ıane, and Randy Goebel,Local community identification in social networks, ASONAM, 2009, pp. 237–242.

Cynthia Dwork, Ravi Kumar, Moni Naor, and D Sivakumar,Rank aggregation methods for the Web, WWW, 2001, pp. 613–622.

Manlio De Domenico, Andrea Lancichinetti, Alex Arenas, and Martin Rosvall,Identifying modular flows on multilayer networks reveals highly overlapping organization in social systems, Phys. Rev5(2015), 011027.

Issam Falih and Rushed Kanawati,Muna: A multiplex network analysis library, The 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (Paris), August 2015, pp. 757–760.

(147)

B

IBLIOGRAPHY

III

Xiaoli Z. Fern and Wei Lin,Cluster ensemble selection, Statistical Analysis and Data Mining1(2008), no. 3, 128–141.

B. H. Good, Y.-A. de Montjoye, and A. Clauset.,The performance of modularity maximization in practical contexts., Physical ReviewE(2010), no. 81, 046106.

M. Girvan and M. E. J. Newman,Community structure in social and biological networks, PNAS99(2002), no. 12, 7821–7826.

Rushed Kanawati,Seed-centric approaches for community detection in complex networks, 6th international conference on Social Computing and Social Media (Crete, Greece) (Gabriele Meiselwitz, ed.), vol. LNCS 8531, Springer, June 2014, pp. 197–208.

,Empirical evaluation of applying ensemble methods to ego-centered community identification in complex networks, Neurocomputing150, B(2015), 417–427.

Zhana Kuncheva and Giovanni Montana,Community detection in multiplex networks using locally adaptive random walks, MANEM 2workshop - Proceedings of ASONAM 2015 (Paris), August 2015.

(148)

B

IBLIOGRAPHY

IV

Andrea Lancichinetti and Santo Fortunato,Limits of modularity maximization in community detection, CoRRabs/1107.1 (2011).

Feng Luo, James Zijun Wang, and Eric Promislow,Exploring local community structures in large networks, Web Intelligence and Agent Systems6(2008), no. 4, 387–400.

Marina Meila,Comparing clusterings by the variation of information, COLT (Bernhard Sch ¨olkopf and Manfred K. Warmuth, eds.), Lecture Notes in Computer Science, vol. 2777, Springer, 2003, pp. 173–187.

Murilo Coelho Naldi, Andr´e C. P. L. F. Carvalho, and Ricardo J. G. B. Campello,Cluster ensemble selection based on relative validity indexes, Data Min. Knowl. Discov.27(2013), no. 2, 259–289.

Clara Pizzuti,A multiobjective genetic algorithm to find communities in complex networks, IEEE Trans. Evolutionary Computation16(2012), no. 3, 418–430.

(149)

B

IBLIOGRAPHY

V

Manisha Pujari and Rushed Kanawati,Supervised rank aggregation approach for link prediction in complex networks, WWW (Companion Volume) (Alain Mille, Fabien L. Gandon, Jacques Misselis, Michael Rabinovich, and Steffen Staab, eds.), ACM, 2012, pp. 1189–1196.

Pascal Pons and Matthieu Latapy,Computing communities in large networks using random walks, J. Graph Algorithms Appl.

10(2006), no. 2, 191–218.

Zied Yakoubi and Rushed Kanawati,Licod: Leader-driven approaches for community detection, Vietnam Journal of Computer Science1(2014), no. 4, 241–256.

Jaewon Yang and Jure Leskovec,Defining and evaluating network communities based on ground-truth, ICDM (Mohammed Javeed Zaki, Arno Siebes, Jeffrey Xu Yu, Bart Goethals, Geoffrey I. Webb, and Xindong Wu, eds.), IEEE Computer Society, 2012, pp. 745–754.