• Aucun résultat trouvé

Analyse des réseaux multiplexes

N/A
N/A
Protected

Academic year: 2022

Partager "Analyse des réseaux multiplexes"

Copied!
149
0
0

Texte intégral

(1)

Multiplex networks analysis

source: muxviz

Rushed Kanawati A3, LIPN, CNRS UMR 7030

SPC - University Paris 13

http://lipn.fr/˜kanawati rushed.kanawati@lipn.univ-paris13.fr

WI 2015, Singapore, 6 December 2015 http://lipn.fr/munm

(2)

A

CKNOWLEDGMENTS

Thanks..

Much of the work presented here is made with former and current students : Nesserine Benchettra, Zied Yakoubi, Manel Hmimida, Manisha Pujari, Issam Falih nd Parisa Rastin

(3)

O

UTLINES

1 Introduction 45 min

Complex network analysis: a brief review

2 Multiplex networks 90 min

Definitions metrics & analysis tasks

3 Applications 60 min

Recommender systems, Cluster ensemble selection

4 Tools 15 min

5 Conclusions & discussion 15 min

(4)

C

OMPLEX NETWORKS

Definition

Graphs abstracting, direct or indirect, interactions in real-world systems

Basic topological features

I Low Density I Small Diameter I Scale-free I High Clustering

coefficient

(5)

S

CALE

-

FREE

(6)

C

LUSTERING COEFFICIENT Definitions

I Global version :

3 ×4

I Local on nodev: # of links between neighbors of v

# Potential links between neighbors of v

CC= 3×18

CCv=1= 16

(7)

N

OTATIONS

A graphG=<V,E⊆V×V>

I Vis the set of nodes (a.k.a. vertices, actors, sites)

I Eis the set of edges (a.k.a. ties, links, bonds)

Notations

I AGAdjacency Matrix d :aij6=0 si les nœuds(vi,vj)∈E, 0 otherwise.

I n=|V|

I m=|E|. We have oftenm∼n

I Γ(v): neighbor’s of nodev.Γ(v) ={x∈V: (x,v)∈E}. I Node degree : d(v) =kΓ(v)k

(8)

C

OMPLEX NETWORKS

: E

XEMPLE

I

Direct interactions

Density : 2−3 Diameter : 28 Clustering coeff.: 0.47

Greatest connected component of Wikipedia

(9)

C

OMPLEX NETWORKS

: E

XEMPLE

II

Indirect interactions

Density : 10−4 Diameter : 24 Clustering coeff.: 0.67

Co-authorship network - DBLP 1980-1984 (for authors active for more than 10 years)

(10)

E

XAMPLE

III

Similarity graphs

Public sites accessibility in Bourget district, Paris

Density : 0.052 Diameter : 7 clustering Coef.: 0.87

(11)

S

IMILARITY GRAPHS

-neighborhood graph: u,vare linked ifd(u,v)≤

k-nearest neighbor graph: each node is connected toknearest nodes.

Relative neighborhood graph:

u,vare linked ifd(u,v)≤maxx{d(v,x),d(u,x)},∀x6=u,v

(12)

S

IMILARITY GRAPHS

Figure:-threshold graph

(13)

C

OMPLEX NETWORKS

6= R

ANDOM

G

RAPHS

Erd ¨os-R´enyi model

Rule : fornnodes, generate edge (i,j) independently at random with probabilityp

Low density Small diameter

Clustering coefficient∼p not scale-free

(14)

C

OMPLEX NETWORKS

6= R

ANDOM

G

RAPHS Small-world graphs

Low density Small diameter

High clustering coefficient not scale-free

(15)

C

OMPLEX NETWORKS

6= R

ANDOM

G

RAPHS

Preferential attachement

Rule : New nodes prefer to connect to high degree nodes

Low density Small diameter

Clustering coefficient→0 Scale-free

(16)
(17)

C

OMPLEX NETWORKS ANALYSIS

A lot of interesting results using asimple model: interacting nodes

Node related analysis tasks

Centralities, Roles, Similarities Community detection

Local, disjoint, overlapping, hierarchical Network evolution mining

Link prediction

(18)

C

ENTRALITIES Nodes centralities

Degree centrality example

I Importance and/or influence of a node

I PageRank, HITS : web search results ranking

I FolkRank: Tag recommendation I Betweenness, Closeness: Importance of

actors in a social network I Viral maketing

(19)

R

OLES Node Roles

red nodes are star centers, blues are in-clique and others are prepherial

I Nodes having a similar structural behaviors

I Centers of stars, in cliques, peripheral nodes,. . .

I Role query, Role outliers, Role dynamics, Role transfer,. . .

I SeeHenderson et al. KDD 2012

(20)

N

ODES SIMILARITY

I A node is similar to itself

I Neighborhood-based similarity: Two nodes are similar if they are connected to the same nodes.

I Distance-based similarity: Two nodes are similar if they are connected with short paths.

I Structural-similarity: Two nodes are similar if each is similar to the neighbors of the other.

(21)

C

OMMUNITY DETECTION

Definitions

I A dense subgraph loosely coupled to other modules in the network

I A community is a set of nodes seen as one by nodes outside the community

I A subgraph where almost all nodes are linked to other nodes in the community.

I . . .

(22)

C

OMMUNITY DETECTION

Applications

I Finding similar items (documents, users, queries, etc.) I Collaboratif filtering

I Network visualisation I Computation distribution I . . .

(23)

L

OCAL COMMUNITY

(24)

L

OCAL COMMUNITY

(25)

L

OCAL COMMUNITY

: G

ENERIC ALGORITHM

1 C← ∅,B← {n0},S←Γ(n0)

2 Q←0 /* a communityquality function*/

3 WhileQcan be enhanced Do 1 n←argmaxn∈SQ

2 S←S− {n}

3 D←D+{n}

4 updateB,S,C 4 ReturnD

(26)

Q

UALITY FUNCTIONS

: E

XEMPLES

I

Local modularityR [Cla05]

R= B Bin

in+Bout

Local modularityM [LWP08]

M= DDin

out

Local modularityL [CZG09]

L= LLin

ex where : Lin=

P

i∈D

kΓ(i)∩Dk

kDk ,Lex=

P

i∈B

kΓ(i)∩Sk kBk

And many many others. . . [YL12]

(27)

G

LOBAL COMMUNITIES DETECTION

Problem

I Divid the set of nodes in a number of (overlapping) subsets such that induced subgraphs are dense and loosely coupled.

Recommended readings

I S. Fortunato.Community detection in graphs. Physics Reports, 2010, 486, 75-174

I L. Tang, H. Liu.Community Detection and Mining in Social Media, Morgan Claypool Publishers, 2010

(28)

C

OMMUNITIES DETECTION

: M

ETHODS

Classification

I Group-based

Nodes are grouped in function of shared topological features (ex. clique)

I Network-based

Clustering, Graph-cut, block-models,modularity optimization I Propagation-based

Unstable approaches, good when used with ensemble clustering I Seed-centred

from local community identification to global community detection

(29)

G

ROUP

-

BASED APPROCHES

Principle

Search for special (dense) subgraphs:

I k-clique I n-clique

I γ-dense clique

I K-core

(30)

G

ROUP

-

BASED APPROCHES

from Symeon Papadopoulos, Community Detection in Social Media, CERTH-ITI, 22 June 2011

(31)

N

ETWORK

-

BASED APPROCHES

Clustering approaches

I Apply classical clustering approaches usinggraph-baseddiastase function

I Different types of Graph-based distances: neighborhood-based, path-based (Random-walk)

I Usually requires the number of clusters to discover

(32)

M

ODULARITY OPTIMIZATION APPROACHES Modularity: a partition quality criteria

Q(P) = 1 2m

X

c∈P

X

i,jc

(Aij−didj

2m) (1)

Figure: Example :Q=0.31

(33)

M

ODULARITY OPTIMIZATION APPROACHES

I Applying classical optimization algorithms (ex. Genetic algorithms [Piz12]).

I Applying hierarchical clustering and select the level withQmax (ex. Walktrap [PL06])

I Divisive approach : Girvan-Newman algorithm [GN02]

I Greedy optimization : Louvain algorithm [BGL08]

I . . .

(34)

E

XAMPLE

: L

OUVAIN APPROACH

(35)

M

ODULARITY OPTIMIZATION LIMITATIONS

Hypothesis

The best partition of a graph is the one that maximize the modularity.

If a network has a community structure, then it is popsicle to find a precise partition with maximal modularity

If a network has a community structure, then partitions inducing high modularity values are structurally similar.

All three hypothesis do not hold [GdMC10, LF11].

(36)

M

ODULARITY RESOLUTION LIMITE

(37)

M

ODULARITY DISTRIBUTION

(38)

P

ROPAGATION

-

BASED APPROACHES

Algorithm 1Label propagation

Require: G=<V,E>a connected graph,

1: Initialize each node with unique labellv

2: whileLabels are not stabledo

3: forv∈Vdo

lv =ll(v)|/* random tie-breaking */

4: end for

5: end while

6: return communities from labels

Γl(v): set of neighbors having labell

(39)

L

ABEL PROPAGATION

Advantages

I Complexity :O(m) I Highly parallel

Disadvantages

I No convergence guarantee, oscillation phenomena

I Low robustness

Different runs yields very different community structure due to randomness

(40)

S

EED

-

CENTRIC ALGORITHMS (KANAWATI, SCSM’2014)

Algorithm 2General seed-centric community detection algorithm Require: G=<V,E>a connected graph,

1: C ← ∅

2: S←compute seeds(G)

3: fors∈Sdo

4: Cscompute local com(s,G)

5: C ← C+Cs 6: end for

7: return compute community(C)

(41)

L

INK PREDICTION

Link predction I Structural

Find hidden/missing links in a network ex. Missing links in Wikipedia

I Temporal

Predicting new links to appear at time tpbased on the network state at instants t<tp

Readings

I M. Pujari et. al.,Link prediction in complex networks, chapter 3, in Advanced methods for complex networks Analysis, N.

Meghanthan (Editor), IGI publishing, 2016.

(42)
(43)

A

LTERNATIVE NETWORK MODELS

Network science is mature enough to a move towards more complex, expressive models

Multi-relationnal networks

(44)

A

LTERNATIVE NETWORK MODELS

Network science is mature enough to a move towards more complex, expressive models

Multi-relationnal networks K-partite networks

(45)

A

LTERNATIVE NETWORK MODELS

Network science is mature enough to a move towards more complex, expressive models

Multi-relationnal networks K-partite networks

Dynamic networks

(46)

A

LTERNATIVE NETWORK MODELS

Network science is mature enough to a move towards more complex, expressive models

Multi-relational networks K-partite networks Dynamic networks Heterogeneous networks

(47)

A

LTERNATIVE NETWORK MODELS

Network science is mature enough to a move towards more complex, expressive models

Multi-relational networks K-partite networks Dynamic networks Heterogeneous networks Attributed networks

(48)

A

LTERNATIVE NETWORK MODELS

Network science is mature enough to a move towards more complex, expressive models

Multi-relational networks K-partite networks Dynamic networks Heterogeneous networks Attributed networks

A powerful model :Multiplex Network

(49)

M

ULTIPLEX

N

ETWORK

: D

EFINITION G=<V,E,C>

from [Mucha et. al., 2010]

I Vset of nodes

I E={E1, . . . ,Eα}:∀k∈[1, α]Ek ⊆V×V I CLayerCouplinglinks

Coupling

I Ordinal Coupling: Diagonal inter-layer links among consecutive layers.

I Categorical Coupling: Diagonal inter-layer links between all pairs of layers.

I Generalized coupling ? Ex. Decay functions

(50)

N

OTATIONS

Notations

I A[k]Adjacency Matrix of slicek:a[k]ij 6=0 si les nœuds(vi,vj)Ek, 0 otherwise.

I m[k]=|Ek|. We have oftenmn

I Neighbor’s ofvin slicek:Γ(v)[k]={xV: (x,v)Ek}. I All neighbors ofv:Γ(v)tot=s∈{1,...,α}Γ(v)[s]

I Node degree in slicek:dkv=kΓ(v)[k]k I Total degree of nodev:dtotv =||Γtot(v)||

(51)

M

ULTIPLEX NETWORKS

: R

ELATED TERMS

Recommended readings

/ S. Mikko Kivel¨a et. al..Multilayer Networks. arXiv:1309.7233, March 2014

(52)

P

OWER OF MULTIPLEX MODEL

Multi-relational networks

European airports network

(53)

P

OWER OF MULTIPLEX MODEL Dynamic networks

Academic collaborations per year

(54)

P

OWER OF MULTIPLEX MODEL Attributed networks

Teenage friendship network- Behavioral attributs : Sport practice level, Alcohol, Tobacco & Cannabis consumption

Proximity graphs can be defined overs nodes using attribute-similarity masures

(55)

P

OWER OF MULTIPLEX MODEL

Heterogeneous networks

DBLP author-centred multiplex network

(56)
(57)

M

ULTIPLEX NETWORKS

: M

EASURES

Need of generalization ofusual measures: Degree

Neighbourhood Centralities

Paths and distances Clustering coefficient . . .

New layer-oriented questions to answer : Which layers determine the centrality of a user

Which layers are relevant to measure the similarity of two nodes

How one layer influence the evolution of another

. . .

(58)

A

PPROACHES

1 Transformation into a monoplex centred problem I Layer aggregation approaches.

I Hypergraph transformation based approaches I Ensemble approaches

2 Generalization of monoplex oriented algorithms to multiplex networks.

(59)

Introduction Multiplex networks Applications Tools Conclusion

L

AYER AGGREGATION

Aij=

(1 1lα:A[l]ij 6=0 0 otherwise

Aij=k {d:A[d]ij 6=0} k

Aij= 1 α

α

X

k=1

wkA[k]ij Aij=sim(vi,vj)

(60)

L

AYER AGGREGATION

Aggregation functions

Aij=

(1 1lα:A[l]ij 6=0 0 otherwise

Aij=k {d:A[d]ij 6=0} k

Aij= 1 α

α

X

k=1

wkA[k]ij Aij=sim(vi,vj)

(61)

K-

UNIFORM HYPERGRAPH TRANSFORMATION

Principle

I A k-uniform hypergraph is a hypergraph in which the cardinality of each hyperedge is exactlyk

I Mapping a multiplex to a3-uniform hypergraphH= (V,E)such that :

V =V∪ {1, . . . , α}

(u,v,i)∈ E if∃l:A[l]uv6=0,u,v∈V,i∈ {1, . . . , α}

I Apply hypergraphs analysis approaches (Ex. tensor-based approaches)

(62)

M

ULTIPLEX

: N

ODE NEIGHBORHOOD

Some options I Γmux(v) =∪α

k=1Γk(v) I Γmux(v) =∩αk=1Γk(v)

I Γmux(v) ={x∈Γ(v)tot :sim(x,v)≥δ}δ∈[0,1]

I Γmux(v) ={x∈Γ(v)tot : Γ(v)Γ(v)tottot∩Γ(x)∪Γ(x)tottot ≥δ}

I . . .

(63)

P

ATHS

,

SHORTEST DISTANCE

Some options

I Path in an aggregated network I daverage=

Pm

α=1d(u,v)[α]

m ∀u,v∈Vand(u,v)∈/ Ei.

I path−length(u,v) =<r1,r2, . . . ,rα>whererinumber of links in layeri

I pathx(u,v)dominatespathy(u,v)∃j:rxj <ryj, ∀k6=j rxj ≤ryj

(64)
(65)

C

OMMUNITY

?

What is a dense subgraph in a multiplex network ?

BerlingerioCG11

(66)

C

OMMUNITY DETECTION IN MULTIPLEX NETWORKS

Approaches

1 Transformation into a monoplex community detection problem I Layer aggregation approaches.

I Multi-objective optimization approach.

I Ensemble clustering approaches

2 Generalization of monoplex oriented algorithms to multiplex networks.

I Generalized-modularity optimization I Generalized info-map

I Generalized walktrap I Seed-centric approaches

(67)

M

ULTI

-

OBJECTIVE OPTIMIZATION APPROACH

[AP14]

1 Rank the set ofαlayers according to someimportance criteria 2 C1←community(G[1])

3 fori∈[2, α]do:

Ci←optimize(community(G[i]),similarity(Ci1)) 4 returnCα

(68)

Introduction Multiplex networks Applications Tools Conclusion

E

NSEMBLE CLUSTERING APPROACHES

I CSPA: Cluster-based Similarity Partitioning Algorithm I HGPA: HyperGraph-Partitioning Algorithm

I MCLA: Meta-Clustering Algorithm I . . .

(69)

E

NSEMBLE CLUSTERING APPROACHES

Ensemble Clustering Strehl2003

I CSPA: Cluster-based Similarity Partitioning Algorithm I HGPA: HyperGraph-Partitioning Algorithm

I MCLA: Meta-Clustering Algorithm I . . .

(70)

E

NSEMBLE CLUSTERING

:

APPROACHES

CSPA: Cluster-based Similarity Partitioning Algorithm

I LetKbe the number of basic models,Ci(x)be the cluster in model ito whichxbelongs.

I Define a similarity graph on objects :sim(v,u) =

PK i=1

δ(Ci(v),Ci(u)) K

I Cluster the obtained graph :

Isolate connected components after prunning edges Apply community detection approach

I Complexity :O(n2kr):n# objects,k# of clusters,r# of clustering solutions

(71)

CSPA : E

XEMPLE

from Seifi, M. Cœurs stables de communaut´es dans les graphes de terrain. Th`ese de l’universit´e Paris 6, 2012

(72)

E

NSEMBLE CLUSTERING

:

ILLUSTRATION

I

(73)

E

NSEMBLE CLUSTERING

:

ILLUSTRATION

II

(74)

E

NSEMBLE CLUSTERING

:

ILLUSTRATION

III

(75)

E

NSEMBLE CLUSTERING

:

ILLUSTRATION

IV

(76)

E

NSEMBLE CLUSTERING

:

ILLUSTRATION

V

(77)

E

NSEMBLE CLUSTERING

:

ILLUSTRATION

VI

(78)

E

NSEMBLE CLUSTERING

:

ILLUSTRATION

VII

(79)
(80)

M

ULTIPLEX

M

ODULARITY

Generalized modularity mucha2010community

I

Qmultiplex(P) = 1 2µ

X

c∈P

X

i,j∈c k,l:1→α

A[k]ij −λkd[k]i d[k]j 2m[k]

δklijCklij

I µ= P

j∈V k,l:1→α

m[k]+Cljk

I Cklij Inter slice coupling = 0∀i6=j

(81)

S

EED

-

CENTRIC ALGORITHMS

[K

AN

14]

Algorithm 3General seed-centric community detection algorithm Require: G=<V,E>a connected graph,

1: C ← ∅

2: S←compute seeds(G)

3: fors∈Sdo

4: Cscompute local com(s,G)

5: C ← C+Cs 6: end for

7: return compute community(C)

(82)

T

HE

L

ICOD ALGORITHM

[YK14]

1 Compute a set of seeds that are likely to be leaders in their communities

Heuristic : nodes having higherdegree centralitiesthan their neighbors 2 Each node in the graph ranks seeds in function of its own

preference

In function of increasingShortest path 3 Iterate till convergence: Each node modifies its preference vector

in function ofneighbor’spreferences

Applyingrank aggregationmethods.

(83)

M

UX

L

ICOD

Multiplex degree centrality [BNL13]

dmultiplexi =−

α

X

k=1

d[k]i

d[tot]i log d[k]i d[tot]i

!

Multiplex shortest path

SP(u,v)multiplex=

α

P

k=1

SP(u,v)[k]

α

Multiplex neighborhood

Γmux(v) ={x∈Γ(v)tot: Γ(v)tot∩Γ(x)tot Γ(v)tot∪Γ(x)tot ≥δ}

(84)

R

ANK AGGREGATION

[PK12, DKNS01]

(85)

O

THER ALGORITHMS

1 Random walk based approach (Generalization of Walktrap [KM15]

2 Generalized infomap [DLAR15]

(86)

E

VALUATION CRITERIA

I

1 Multiplex modularity 2 Redundancy [BCG11]

ρ(c) = X

(u,v)∈P¯¯c

k {k:∃A[k]uv6=0} k α× kPck

¯¯

Pthe set of couple(u,v)which are directly connected in at least two layers

3 Complementarity :γ(c) =Vc×εc× Hc

(87)

E

VALUATION CRITERIA

II

I VarietyVc: the proportion of occurrence of the communityc across layers of the multiplex.

Vc =

α

X

s=1

k∃(i,j)∈c/A[s]ij 6=0k

α−1 (2)

I Exclusivityεc: number of pairs of nodes, in communityc, that are connected exclusively in one layer.

εc=

α

X

s=1

kPc,sk

kPck (3)

(88)

E

VALUATION CRITERIA

III

I HomogeneityHc: How uniform is the distribution of the number of edges, in the communityc, per layer.

Hc=

1 if σc=0 1−σσmaxc

c otherwise (4)

with

avgc=

α

X

s=1

kPc,sk α

σc= v u u t

α

X

s=1

(kPc,sk −avgc)2 α

σcmax=

r(max(kPc,d k)−min(kPc,dk))2 2

(89)

D

ATASETS

Benchmark networks

Lazzega Lawyer network

#nodes 71

#layer 3

(90)

D

ATASETS

Dataset

Physicians collaboration network

#nodes 246

#layers 3

(91)

R

ESULTS

: R

EDUNDANCY

(92)

R

ESULTS

: C

OMPLEMENTARITY

(93)

R

ESULTS

:

MULTIPLEX MODULARITY

(94)

P

ARETO FRONT

(95)

L

AZEGA DATASET

: C

OMPARATIVE STUDY

Figure: NMI (lower triangular part) , adjusted Rand (upper triangular part).95 / 149

(96)
(97)

A

PPLICATIONS

1 Film recommandation 2 Tag recommendation

3 Collaboration recommendation 4 Ensemble clustering selection

(98)

F

ILM RECOMMENDATION

Film rating matrix = bipartite graph

(99)

F

ILM RECOMMENDATION

(100)

F

ILM RECOMMENDATION

(101)

F

ILM RECOMMENDATION

: M

ULTIPLEX NETWORK

Figure: Movieslens 100k multiplex

(Projection by users) Figure: Movieslens 100k multiplex (Projection by movies)

(102)

F

ILM RECOMMENDATION

:

RESULTS

Simple approach

Recommend the statistical mode value of links linking clusters of target user to the cluster of target films

MAE RMSE Precision Recall F1-measure GTM 0.9441 1.2549 0.2185 0.2207 0.2195 T. co-clustering 0.9293 1.2562 0.25587 0.2094 0.2303 muxlicod 0.9635 1.2773 0.2274 0.2134 0.2202 LA louvain 0.8352 1.1509 0.3113 0.2521 0.2779 LA walktrap 0.8216 1.1155 0.2642 0.2233 0.2420 PA louvain 0.8713 1.1917 0.2532 0.2032 0.2245 PA walktrap 0.8801 1.2023 0.2705 0.2011 0.2283 Table: Result of the proposed recommendation system with each algorithm in MovieLens 100k dataset (PA : Partition Aggregation, LA : Layer Aggregation

(103)

A

PPLICATION

II: TAG

RECOMMENDER

(TLTR)

Figure: TLTR model

(104)

F

OLKSONOMY GRAPHS

Figure: Tripartite graph projection into three bipartite graphs

(105)

M

ULTIPLEX NETWORK CONSTRUCTION

Figure: Tag multiplex network: Steps of transformation

(106)

E

XPERIMENTS

: B

IBSOMNOMY DATASET

# Users # Tags # Resources # Edges

116 412 361 24297

Table: Bibsonomy dataset

Networks slices Nodes Edges Density

User User-Resource 116 901 0,135

User-Tag 116 985 0,147

Tag Tag-Resource 412 2496 0,0294

Tag-User 412 1956 0,0231

Resource Resource-Tag 361 2814 0,0433 Resource-User 361 1685 0,0259 Table: Multiplex networks of Bibsonomy

(107)

T

AG RECOMMANDATION

:

RESULTS

Graphs #Nodes #Edges #Users #Tags #Resources

G 889 24297 116 412 361

Gc(Mux-Licod) 434 1677 97 154 183

compression in % 51,18 93,1 16,37 62,62 49,30

Gc(GenLouvain) 16 79 4 6 6

compression in % 98,2 99,67 96,55 98,54 98,33

Gc(LA (Licod)) 91 46 13 40 38

compression in % 89,76 99,81 88,79 90,29 89,47

Gc(LA (Louvain)) 9 27 3 3 3

compression in % 98,98 99,88 97,41 99,27 99,16

Gc(EC (Licod)) 151 993 3 89 59

compression in % 83,08 95,91 97,41 78,39 83,65

Gc(EC (Louvain)) 25 187 8 11 6

compression in % 97,18 78,96 93,10 97,33 98,33 Table: Compression rate of the initial graph

107 / 149

(108)

T

AG RECOMMANDATION

:

RESULTS

Figure: Comparative study of different tags recommendation approaches in terms of precision withkt =1

(109)

T

AG RECOMMANDATION

:

RESULTS

Figure: Comparative study of different tags recommendation approaches in terms of precision withkt =2

(110)

T

AG RECOMMANDATION

:

RESULTS

Figure: Comparative study of different tags recommendation approaches in terms of precision withkt =3

(111)

T

AG RECOMMANDATION

:

RESULTS

Figure: Comparative study of different tags recommendation approaches in terms of precision withkt =4

(112)

D

ISCUSSION

I Multiplex approaches outperform layer aggregation and EC approaches on benchmark networks

I Layer aggregation approaches do well for film recommendation ! I EC approaches rank first for Tag recommendation !

I Problem what is the Validity of topological community quality indexes ?

(113)

A

PPLICATION

III: S

CIENTIFIC

C

OLLABORATION RECOMMENDER

(114)

L

INK PREDICTION

:

SUPERVISED APPROACH

(115)

E

XPERIMENTS

: DBLP

Years Properties Co-Author Co-Venue Co-Citation

1970-1973 Nodes 91 91 91

Edges 116 1256 171

1972-1975 Nodes 221 221 221

Edges 319 5098 706

1974-1977 Nodes 323 323 323

Edges 451 9831 993

Table:Basic statistics about the 3-layer DBLP multiplex networks

Years # Positive # Negatives Train/Test Labeling

1970-1973 1974-1975 16 1810

1972-1975 1976-1977 49 12141

1974-1977 1978-1979 93 26223

Table: #examples extracted from co-authorship layer (number of unconnected nodes in

connected components) 115 / 149

(116)

L

INK PREDICTION

: R

ESULTS

Learning:1970-1973 Learning:1972-1975 Attributes Test:1972-1975 Test:1974-1977

F-measure AUC F-measure AUC

Setdirect 0.0357 0.5263 0.0168 0.4955

Setdirect+indirect 0.0256 0.5372 0.0150 0.5132

Setdirect+multiplex 0.0592 0.5374 0.0122 0.5108

Setall 0.0153 0.5361 0.0171 0.5555

Setmultiplex 0.0374 0.5181 0.0185 0.5485

Table: Comparative link prediction results applying decision tree algorithm using different types of attributs

(117)
(118)

A

PPLICATION

IV: E

NSEMBLE

C

LUSTERING

S

ELECTION

Motivation

The quality of a consensus clustering depends on both thequalityand diversityof input base clusterings [FL08, AF09, NCC13, ADIA15].

Problem definition

I LetΠ ={π1, . . . , πn}be a set of base partitions I ES(Π) = Π ⊂Π :Q(EC(Π))>Q(EC(Π)) I Q: Quality of the consensus clustering

(119)

D

IVERSITY

Clustering Similarity measures I Purity

I Rand/ARI

I NMI (Normlized mutual information) I IV (Information variation) [Mei03]

I . . .

(120)

Q

UALITY

Cluster internal quality indexes [AR14]

I Silhouette index,

I Calinski-Harabasz index I Davis-Bouldin index I Dunn index

I . . .

Network-oriented indexes I Modularity

I Average conductance

I Average local Modularities : L, M, R [Kan15]

I See also [YL12]

I . . .

(121)

E

NSEMBLE SELECTION APPROACHES

:

LIMITATIONS I Existing approaches are defined for attribute/value datasets with

metric distances

I Use of one quality/diversity measure.

I Requires the number of clusters to select as input.

I . . .

Proposed approach: contributions

I Designed for both networks and attribute/value datasets I Use of anensembleof quality/diversity measures.

I The number of selected base clustering is automatically computed.

(122)

E

NSEMBLE SELECTION APPROACH

The idea

Cluster the set of base clusterings using an ensemble of similarity measures

Apply amultiplex community detectionalgorithm to a multiplex network whose nodes are the set of base clusterings and whose layers are defined by a set ofproximity graphs, each defined according a to a given similarity measure

From each cluster select the node (i.e clustering) that is ranked first according to an ensemble of quality measures.

Applyensemble rankingalgorithms

(123)

E

NSEMBLE SELECTION APPROACH

Algorithm 4Graph-based cluster ensemble selection algorithm Require: Π ={π1, . . . , πr}a set of base clusterings

Require: S={S1, . . . ,Sn}A set of partition similarity functions Require: Q={Q1, . . . ,Qm}A set of partition quality functions

1: Π ← ∅

2: MUX←Multiplex(Π)

3: for allSi ∈ S do

4: MUX.add layer(proximity graph(Π,Si))

5: end for

6: C={c1, . . . ,ck} ←community detection(MUX)

7: for allc∈ C do

8: ˆπ←ensemble Ranking(c,Q)

9: Π←Π∪ {ˆπ}

10: end for

11: return Π

(124)

T

HE PROPOSED APPROACH

(125)

T

HE PROPOSED APPROACH

(126)

T

HE PROPOSED APPROACH

(127)

E

NSEMBLE RANKING

Problem

I LetLbe a set of elements to rank bynrankers I Letσibe the rank provided by rankeri

I Goal: Compute a consensus rank ofL.

D´ej`a Vu: Social choice algorithms, but. . .

I Small number of voters and big number of candidates I Algorithmic efficiency is required

Algorithms I Borda

I Kemeny approaches (commuting Condorcet winner if it exists)

(128)

E

XPERIMENT ON SMALL NETWORKS WITH KNOWN GROUND TRUTH PARTITIONS

I Generation of 20 base clusterings applying a standard Label propagation algorithm

I Proximity graphs : RNG

I S ={NMI, ARI, VI} Q={modularity, Local modularities L, M, R}

Table: Evaluation of the proposed graph-based ensemble selection

Dataset Approach NMI ARI

Zachary Ensemble clustering without selection 0.57 0.46 Ensemble clustering with selection 0.77 0.69 US Politics Ensemble clustering without selection 0.55 0.68 Ensemble clustering with selection 0.68 0.67 Dolphins Ensemble clustering without selection 0.55 0.39

(129)

E

XPERIMENT

II : DBLP

CO

-

AUTHORSHIP NETWORK I Co-authorship network 1970-1977 (GCC) :|V|=643,|m|=886 I Generation of 10, 100 base clusterings

I Proximity graphs : RNG

I S ={NMI, ARI, VI} Q={modularity, Local modularities L, M, R}

Table: Evaluation of the proposed graph-based ensemble selection

# base clusterings 10

Nodes Compression without selection 18,3%

Nodes Compression with selection 20,9%

Edge compression without selection 17,2%

Edge compression with selection 17,6%

Modularity without selection 0.3734 Modularity with selection 0.43756

(130)

E

XPERIMENT

II : DBLP

CO

-

AUTHORSHIP NETWORK

Table: Evaluation of the proposed graph-based ensemble selection

# base clusterings 100

Nodes Compression without selection 35,1%

Nodes Compression with selection 40,3%

Edge compression without selection 36,2%

Edge compression with selection 38,3%

Modularity without selection 0.4031 Modularity with selection 0.4665

(131)
(132)

M

ULTIPLEX ANALYSIS TOOLS

I muxviz:http://muxviz.net I Muna:http://lipn.fr/muna I Pymnet

http://people.maths.ox.ac.uk/kivela/mln_library/

(133)

M

UXVIZ

I Rpackage I Main features :

Visualization

Layer compression methods Basic metrics

Community detection : Modularity-based, infomap I Input : text file per layer + one file for the general structure.

(134)

M

UNA

I Available forRandPython I Built on top ofigraph

I Extended set of multiplex network edition functions (similar to igraph)

I Basic metrics : degree, neighborhood

I Extended set of community detection approaches I Topological community evaluation indexes.

I Limitations :

I No visualisation support

I Simple categorical coupling only.

(135)

P

YMNET

I Pure Python + integration with networkX package.

I Can handle general multilayer networks

I Rule based generation and lazy-evaluation of coupling edges I Various network analysis methods, transformations, reading and

writing networks, network models etc.

I Visualization support

(136)

P

YMNET

: V

ISUALISATION EXEMPLE

(137)

C

ONCLUSIONS

I Multiplex networksprovide a rich representation of real-world interaction systems

I A lot of work to reformulate basic network concepts for multiplex settings

ex. Roles, RandomWalk, PageRank, etc.

I New tools for multiplex mining : Muna [FK15], muxviz[?],Pymnet I Community evaluation: still an open problem

I Uncovered topics : Layer selection and compression, Co-evolution models, Dynamics on multiplex networks

I Ideas under exploration:

Multiplex approach for attributed networks mining Multiplex of multiplexes

Interactive Multiplex network visualisation.

Benchmarking available tools

(138)

P

YMNET

: V

ISUALISATION EXEMPLE

(139)

O

UR GROUP RELATED BIBLIOGRAPHY

I

I Rushed Kanawati, Multiplex Network mining: a brief survey IEEE Intelligent Informatics Bulletin, December 2015.

I Manisha Pujari, Rushed Kanawati, Link prediction in complex networks, chapter 3, in Advanced methods for complex networks Analysis, N. Meghanthan (Editor), IGI publishing.

I Issam Falih, Nistor Grozavu, Rushed Kanawati, Youn`es Bennani, A recommendation system based on unsupervised topological learning, 22nd International Conference on Neural Information Processing (ICONIP2015) 9-12, 2015 Istanbul, Turkey.

I Issam Falih, Rushed Kanawati, Muna: A Multiplex network analysis Library, The 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, 25-28 August 2015, Paris.

I Rushed Kanawati, Ensemble selection for enhancing graph coarsening quality. 5th international workshop on Social Network Analysis (ARS’15), 27-28 April 2015 Capri I Parisa Rastin, Rushed Kanawati, A multiplex-network based approach for clustering

ensemble selection, International workshop on Multiplex network mining (MANEM@ASONAM), 25-28 August 2015, Paris

(140)

O

UR GROUP RELATED BIBLIOGRAPHY

II

I Manisha Pujari, Rushed Kanawati.Link prediction in multiplex networksAIMS Networks Heterogeneous Media Journal, special issue on multiplex networks, Vol. 10 (1) March 2015 (IF. 0.95)

I Manel Hmimida, Rushed Kanawati,Community detection in multiplex networks: A seed-centric approachAIMS Networks Heterogeneous Media Journal, special issue on multiplex networks, Vol. 10 (1) March 2015

I Rushed Kanawati.Empirical evaluation of applying ensemble methods to ego-centered community identification in complex networks.Neurocomputing, Volume 150, Part B, 20 February 2015, pp. 417-427

I Zied Yakoubi, Rushed Kanawati.Licod: A Leader-driven algorithm for community detection in complex networks.Vietnam Journal of Computer Science - Springer, Volume 1, Issue 4.

November 2014 pp. 241-256

I Rushed Kanawati,Ensemble selection for community detection in complex networks.7th international conference on Social Computing and Social Media, 2-7 August, 2015 Los Angeles

(141)

O

UR GROUP RELATED BIBLIOGRAPHY

III

I Issam Falih, Manel Hmimida, Rushed Kanawati,Une approche centr´ee graine pour la d´etection de communaut´es dans les r´eseaux multiplexes. Conf´erence EGC’15, 27-30 Jan. 2015 Luxembourg.

I Rushed Kanawati Multi-objective approach for local community computation. 26th IEEE International conference on tools with artificial Intelligence (ICTAI’2014) 10-12 November 2014 Limassol, Cyprus.

I Rushed Kanawati Community detection in social networks: the power of ensemble methods. ACM/IEEE International conference on data science advanced analytics (DSAA’2014) October 30 - November 1, 2014 Shanghai.

I Rushed Kanawati. YASCA: An ensemble based approach for community detection in complex networks, In proceedings of the 20th International Conference on computing combinatorics (COCOON), LNCS 8691 Atlanta 4-6 August 3014. pp. 657-666

I Rushed Kanawati. Seed-centric approaches for community detection in complex networks 6th international conference on Social Computing and Social Media, 22-27 June 2014, Crete, Greece.

(142)

O

UR GROUP RELATED BIBLIOGRAPHY

IV

I Manisha Pujari, Rushed Kanawati. Link Prediction in Complex Networks by Supervised Rank Aggregation. ICTAI 2012 : 24th IEEE International Conference on Tools with Artificial Intelligence November 7-9, 2012, Athens, Greece.

I Manisha Pujari, Rushed Kanawati. Tag recommendation by link prediction based on supervised machine learning. Sixth International AAAI Conference on Weblogs and Social Media (ICWSM 2012). 5-8 June 2012, Dublin.

I Manisha Pujari, Rushed Kanawati. Supervised machine learning link prediction approach for tag recommandation. 4th International Conference on Online Communities and Social Computing @ HCI International 2011, 9-14 July 2011, Hilton Orlando Bonnet Creek, Orlando, Florida, USA, LNCS Springer.

I Nesrine Benchettara, Rushed Kanawati, C´eline Rouveirol. A Supervised Machine Learning Link Prediction Approach for Academic Collaboration Recommendation. 4th International ACM Recommender System conference 26-30/9 2010 - Barcalona, spain. pp. 253-256

(143)

O

UR GROUP RELATED BIBLIOGRAPHY

V

I Nesrine Benchettara, Rushed Kanawati, C´eline Rouveirol. Supervised machine learning applied to link prediction in bipartite social networks. The 2010 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2010). 9-11 August 2010.

Odense Denmark. pp.326-330

(144)

That’s all folks !

Questions ?

(145)

B

IBLIOGRAPHY

I

Ebrahim Akbari, Halina Mohamed Dahlan, Roliana Ibrahim, and Hosein Alizadeh,Hierarchical cluster ensemble selection, Engineering Applications of Artificial Intelligence39(2015), 146–156.

Javad Azimi and Xiaoli Fern,Adaptive cluster ensemble selection, IJCAI (Craig Boutilier, ed.), 2009, pp. 992–997.

Alessia Amelio and Clara Pizzuti,Community detection in multidimensional networks, IEEE 26th International Conference on Tools with Artificial Intelligence, 2014, pp. 352–359.

Charu C. Aggarwal and Chandan K. Reddy (eds.),Data clustering: Algorithms and applications, CRC Press, 2014.

Michele Berlingerio, Michele Coscia, and Fosca Giannotti,Finding and characterizing communities in multidimensional networks, ASONAM, IEEE Computer Society, 2011, pp. 490–494.

Vincent D Blondel, Jean-loup Guillaume, and Etienne Lefebvre,Fast unfolding of communities in large networks, Journal of Statistical Mechanics: Theory and Experiment2008(2008), P10008.

(146)

B

IBLIOGRAPHY

II

Federico Battiston, Vincenzo Nicosia, and Vito Latora,Metrics for the analysis of multiplex networks, CoRRabs/1308.3182 (2013).

Aaron Clauset,Finding local community structure in networks, Physical Review E (2005).

Jiyang Chen, Osmar R. Za¨ıane, and Randy Goebel,Local community identification in social networks, ASONAM, 2009, pp. 237–242.

Cynthia Dwork, Ravi Kumar, Moni Naor, and D Sivakumar,Rank aggregation methods for the Web, WWW, 2001, pp. 613–622.

Manlio De Domenico, Andrea Lancichinetti, Alex Arenas, and Martin Rosvall,Identifying modular flows on multilayer networks reveals highly overlapping organization in social systems, Phys. Rev5(2015), 011027.

Issam Falih and Rushed Kanawati,Muna: A multiplex network analysis library, The 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (Paris), August 2015, pp. 757–760.

(147)

B

IBLIOGRAPHY

III

Xiaoli Z. Fern and Wei Lin,Cluster ensemble selection, Statistical Analysis and Data Mining1(2008), no. 3, 128–141.

B. H. Good, Y.-A. de Montjoye, and A. Clauset.,The performance of modularity maximization in practical contexts., Physical ReviewE(2010), no. 81, 046106.

M. Girvan and M. E. J. Newman,Community structure in social and biological networks, PNAS99(2002), no. 12, 7821–7826.

Rushed Kanawati,Seed-centric approaches for community detection in complex networks, 6th international conference on Social Computing and Social Media (Crete, Greece) (Gabriele Meiselwitz, ed.), vol. LNCS 8531, Springer, June 2014, pp. 197–208.

,Empirical evaluation of applying ensemble methods to ego-centered community identification in complex networks, Neurocomputing150, B(2015), 417–427.

Zhana Kuncheva and Giovanni Montana,Community detection in multiplex networks using locally adaptive random walks, MANEM 2workshop - Proceedings of ASONAM 2015 (Paris), August 2015.

(148)

B

IBLIOGRAPHY

IV

Andrea Lancichinetti and Santo Fortunato,Limits of modularity maximization in community detection, CoRRabs/1107.1 (2011).

Feng Luo, James Zijun Wang, and Eric Promislow,Exploring local community structures in large networks, Web Intelligence and Agent Systems6(2008), no. 4, 387–400.

Marina Meila,Comparing clusterings by the variation of information, COLT (Bernhard Sch ¨olkopf and Manfred K. Warmuth, eds.), Lecture Notes in Computer Science, vol. 2777, Springer, 2003, pp. 173–187.

Murilo Coelho Naldi, Andr´e C. P. L. F. Carvalho, and Ricardo J. G. B. Campello,Cluster ensemble selection based on relative validity indexes, Data Min. Knowl. Discov.27(2013), no. 2, 259–289.

Clara Pizzuti,A multiobjective genetic algorithm to find communities in complex networks, IEEE Trans. Evolutionary Computation16(2012), no. 3, 418–430.

(149)

B

IBLIOGRAPHY

V

Manisha Pujari and Rushed Kanawati,Supervised rank aggregation approach for link prediction in complex networks, WWW (Companion Volume) (Alain Mille, Fabien L. Gandon, Jacques Misselis, Michael Rabinovich, and Steffen Staab, eds.), ACM, 2012, pp. 1189–1196.

Pascal Pons and Matthieu Latapy,Computing communities in large networks using random walks, J. Graph Algorithms Appl.

10(2006), no. 2, 191–218.

Zied Yakoubi and Rushed Kanawati,Licod: Leader-driven approaches for community detection, Vietnam Journal of Computer Science1(2014), no. 4, 241–256.

Jaewon Yang and Jure Leskovec,Defining and evaluating network communities based on ground-truth, ICDM (Mohammed Javeed Zaki, Arno Siebes, Jeffrey Xu Yu, Bart Goethals, Geoffrey I. Webb, and Xindong Wu, eds.), IEEE Computer Society, 2012, pp. 745–754.

Références

Documents relatifs

The proposed algorithm, called CNTS for combined neighborhood tabu search (CNTS), relies on a joint use of vertex move and merge operators to improve the quality of the

In Swiss-Webster mice, which are already relatively resistant to primary infection, no peak of parasite load was observed upon reinfection.. In contrast,

Ang2 deficient mice are characterized by impaired patterning of the lymphatic vascular network and recruitment of smooth muscle cells during development.. Interestingly,

Le tramétinib a notamment fait l’objet d’une étude à répartition aléatoire de phase 3 menée auprès de 322 patients atteints d’un mélanome métastatique avec présence de la

Even-though these works offer automatic organization, they present limitations related to: (i) no user-centric processing, these methods do not include user- related features

Due to the public nature of Online Social Networks like Twitter, apart from identifying the real identity of a user, an attacker will usually try to infer sensitive attribute values

Social media platforms have become a forum where users can share content and in- teract with one another. Though the involvement of the word “social” indicates the big

To deter- mine the degree of glycosylation and to identify the sites of modification on the flagellin monomer, mass spectrometry was employed for detailed structural analysis..