• Aucun résultat trouvé

Social network Analysis - Community detection

N/A
N/A
Protected

Academic year: 2022

Partager "Social network Analysis - Community detection"

Copied!
104
0
0

Texte intégral

(1)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

Social network analysis: Community detection

Rushed Kanawati LIPN, CNRS UMR 7030; USPC

http://lipn.fr/ ∼ kanawati

[email protected]

1 / 104

(2)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

P LAN

1

I

NTRODUCTION

2

C

OMPLEX NETWORK ANALYSIS TASKS

3

Community detection

Local community detection Global communities detection Community evaluation approaches

4

Challenges

5

Conclusion

(3)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

C OMPLEX NETWORK

Definition

Graphs modeling (direct/indirect) interactions among actors.

Basic topological features

I Low Density I Small Diameter I Heterogeneous degree

distribution.

I High Clustering coefficient

I Community structure

(4)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

E XAMPLE I : SCIENTIFIC COLLABORATION NETWORK

Density : 104 Diameter : 24 Clustering coeff.: 0.67

Co-authorship network - DBLP 1980-1984 (for authors active for more than 10 years)

(5)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

E XAMPLE II: SPATIAL NETWORK

Public sites accessibility in the Bourget district

projet FUI UrbanD 2009-2012

Densiy : 0.052 Diameter : 7 clustering Coef.: 0.87

(6)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

E XAMPLE III: P RODUCT PURCHASE NETWORK

projet ANR CADI 2008-2010

Density : 0.02 Diameter : 8

Clustering coef. : 0.84

(7)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

E XAMPLE IV: C O - RATING NETWORK

Movie Lens

Density : 0.061 Diameter : 4

Clustering coef. : 0.626

(8)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

E XAMPLE V: PRODUCT NETWORK

MovieLens

Density : 0.22 Diameter : 4

Clustering coef. : 0.746

(9)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

A NOTHER EXAMPLE

I Graph of the Internet I Nodes = service

providers

I Edges : Connections

[CHK

+

07]

(10)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

A NOTHER EXAMPLE

I Mubarak’s resignation announcement.

I Nodes = Twitter user

I Edges : retweet on

#jan25 hashtag

http://gephi.org/2011/the-egyptian-revolution-on-twitter/

(11)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

L AST EXAMPLE !

I Ingredients network extracted fom cooking receipts

I Nodes = Ingredients

I Edges : usage in the same receipt.

[Lada Adamic Course on Network Analysis: Coursera]

(12)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

N OTATIONS

A graph G =< V, E ⊆ V × V >

I V is the set of nodes (a.k.a. vertices, actors, sites)

I E is the set of edges (a.k.a. ties, links, bonds)

Notations

I A

G

Adjacency Matrix d : a

ij

6= 0 si les nœuds (v

i

, v

j

) ∈ E, 0 otherwise.

I n = |V|

I m = |E|. We have often m ∼ n

(13)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

C OMPLEX NETWORK ANALYSIS : T ASKS

Node oriented

I Nodes ranking in function of their importance, influence, . . . I Applying centrality functions

I Applications: Ranking (ex. researchers ! ), Viral Marekting, Cyper-attacks (prevention); . . .

Network oriented

I Network formation & evolution models I Link prediction, Diffusion models

I Applications: Explanation, Recommendation, Anomalies detection, infirmation diffusion, . . .

Community oriented

(14)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

C ENTRALITY : E XAMPLES

Degree centrality

I C

d

(v) =

maxk(Γ(v)k

u∈VkΓ(u)k

I Complexity : O(m)

(15)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

C ENTRALITY : SOME EXAMPLES

Closeness centrality

I C

c

(v) =

P 1

u∈Vsp(v,u)

I sp(u, v) : shortest path length between u, v.

I Complexity =

O(n × m + n

2

logn)

(16)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

C ENTRALITY : SOME EXAMPLES

Betweenness centrality

I C

i

(v) = P

s,t∈V,stv σs,t(v)

σs,t

I σ

s,t

(v) : number of shortest paths linking s to t that include v

I σ

s,t

: total number of shortest paths linking s to t

I Complexity = O(n

3

)

(17)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

C OMMUNITY ?

Definitions

I A dense subgraph loosely coupled to other modules in the network

I A community is a set of nodes seen as one by nodes outside the community

I A subgraph where almost all nodes are linked to other nodes in the community.

I . . .

(18)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

C OMMUNITY DETECTION PROBLEM

I Local community identification (ego-centred).

I Network partition computing

I Overlapping community detection

(19)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

L OCAL COMMUNITY

(20)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

L OCAL COMMUNITY

(21)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

L OCAL COMMUNITY

1 C ← {φ} , B ← { n

0

} S ← Γ(n

0

)

2 Q ← 0 /* a community quality function */

3 While Q can be enhanced Do 1 n ← argmax

n∈S

Q

2 S ← S − { n }

3 D ← D + { n }

4 update B, S, C

4 Return D

(22)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

Q UALITY FUNCTIONS : E XEMPLES I

Local modularity R [Cla05]

R =

B Bin

in+Bout

Local modularity M [LWP08]

M =

DDin

out

Local modularity L [CZG09]

L =

LLin

ex

where : L

in

=

P

i∈D

kΓ(i)∩Dk

kDk

, L

ex

=

P

i∈B

kΓ(i)∩Sk kBk

And many many others . . . [YL12]

(23)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

L OCAL MODULARITY LIMITATIONS : A N EXEMPLE

Blank nodes enhance B

in

and D

in

without affecting B

out

and D

out

Bank nodes will be added if M or R modularities are used

Low precision computed communities

(24)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

M ULTI - OBJECTIVE LOCAL COMMUNITY

IDENTIFICATION [?]

Three main approaches

Combine then Rank

Ensemble ranking

Ensemble clustering

(25)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

C OMBINE THEN R ANK

Principle

Let Q

i

(s) be the local modularity value induced by node s ∈ S

Q ]

i

(s) =

 

 

Qi(s)−min

u∈SQi(u) maxuSQi(u)−min

u∈SQi(u)

if max

u∈S

Q

i

(u) 6= min

u∈S

Q

i

(u)

1 otherwise

Q

com

(s) =

1k

k

P

i=1

Q ]

i

(s)

(26)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

E NSEMBLE RANKING APPROACHES

Principle

I Rank S in function of each local modularity Q

i

I Select the winner after applying ensemble ranking approach I What stopping criteria to apply ?

Stopping criteria

I Strict policy : All modularities should be enhanced

I Majority policy : Majority of local modularities are enhanced

I Least gain policy : At least one local modularity is enhanced.

(27)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

E NSEMBLE RANKING

Problem

I Let S be a set of elements to rank by n rankers I Let σ

i

be the rank provided by ranker i

I Goal: Compute a consensus rank of S.

D´ej`a Vu: Social choice algorithms, but . . .

I Small number of voters and big number of candidates I Algorithmic efficiency is required

I Output could be a complete rank

(28)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

E NSEMBLE RANKING

Problem

I Let S be a set of elements to rank by n rankers I Let σ

i

be the rank provided by ranker i

I Goal: Compute a consensus rank of S.

D´ej`a Vu: Social choice algorithms, but . . .

I Small number of voters and big number of candidates I Algorithmic efficiency is required

I Output could be a complete rank

(29)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

E NSEMBLE RANKING : APPROACHES

Jean-Charles de Borda [1733-1799]

Borda

I Borda’s score of i ∈ σ

k

:

B

σk

(i) = {count(j)|σ

k

(j) < σ

k

(i) ; j ∈ σ

k

} . I Rank elements in function of

B(i) = P

k

t=1

w

t

× B

σt

(i).

(30)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

E NSEMBLE RANKING : APPROACHES

Marquis de Condorcet [1743-1794]

Condorcet

I The winner is the candidate who defeats every other candidate in pairwise majority-rule election

I The winner may not exists

Condorcet 6= Borda

I Votes : 6 × A B C, 4 × B C A I Borda winner : B

I Condorcet winner : A

(31)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

E NSEMBLE RANKING : APPROACHES

Marquis de Condorcet [1743-1794]

Condorcet

I The winner is the candidate who defeats every other candidate in pairwise majority-rule election

I The winner may not exists

Condorcet 6= Borda

I Votes : 6 × A B C, 4 × B C A I Borda winner : B

I Condorcet winner : A

(32)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

E NSEMBLE RANKING : APPROACHES

Extended Condorcet criterion

If for every a ∈ A and b ∈ B, majority prefers a to b the all elements in A should be ranked before any element in B.

Kenneth Arrow, 1921-

Arrow’s Theorem

No vote rule can have the following desired proprieties :

I Every result must be achievable somehow.

I Monotonicity.

I Independence of irrelevant attributes.

I Non-dictatorship.

(33)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

E NSEMBLE RANKING : APPROACHES

John Kemeny 1926-1992

Optimal Kemeny rank aggregation

I Let d() be distance over rankings σ

i

(ex.

Kendall τ , Spearman’s footrule) I Find π that minimise P

i

d(π, σ

i

) I NP-Hard problem

I Approximation : Local Kemeny : two adjacent candidats are in the good order.

I Local Kemeny : Apply Bubble sort using the

majority preference partial order relationship

I Approximate Kemeny : Apply QuickSort

(34)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

E NSEMBLE CLUSTERING APPROACHES

Principle

I Let C

Qvqi

be the the local community of v

q

applying Q

i

. I We have a natural partition : P

Qi

= {C

Qvqi

, C

Qvqi

}

I Apply an ensemble clustering approach.

(35)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

E NSEMBLE CLUSTERING : PRINCIPLE

from A. Topchy et. al. Clustering Ensembles: Models of Consensus and Weak Partitions. PAMI, 2005

(36)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

E NSEMBLE CLUSTERING : APPROACHES

CSPA: Cluster-based Similarity Partitioning Algorithm

I Let K be the number of basic models, C

i

(x) be the cluster in model i to which x belongs.

I Define a similarity graph on objects : sim(v, u) =

K

P

i=1

δ(Ci(v),Ci(u)) K

I Cluster the obtained graph :

Isolate connected components after prunning edges Apply community detection approach

I Complexity : O(n

2

kr) : n # objects, k # of clusters, r# of clustering

solutions

(37)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

CSPA : E XEMPLE

from Seifi, M. Cœurs stables de communaut´es dans les graphes de terrain. Th`ese de l’universit´e Paris 6, 2012

(38)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

E NSEMBLE CLUSTERING : APPROACHES

HGPA: HyperGraph-Partitioning Algorithm

I Construct a hypergraph where nodes are objects and hyperedges are clusters.

I Partition the hypergraph by minimizing the number of cut hyperedges

I Each component forms a meta cluster

I Complexity : O(nkr)

(39)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

E NSEMBLE CLUSTERING : APPROACHES

MCLA: Meta-Clustering Algorithm

I Each cluster from a base model is an item

I Similarity is defined as the percentage of shared common objects I Conduct meta-clustering on these clusters

I Assign an object to its most associated meta-cluster

I Complexity : O(nk

2

r

2

)

(40)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

E XPERIMENTS

Protocol ([Bag08])

1 Apply the different algorithms on nodes in networks for which a ground-truth community partition is known.

2 For each node compute the distance between the real-partition and the computed one (Ex. NMI [Mei03])

3 Compute average and standard deviation for the network.

(41)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

D ATASETS

Zachary’s Karate Club

US Political books network

Football network

Dolphins social network

(42)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

R ESULTS : EVALUATING STOPPING CRITERIA (NMI)

Zachary’s Karate Club Football network

(43)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

R ESULTS : C OMPARATIVE RESULTS (NMI)

Zachary’s Karate Club

US Political books network

Football network

Dolphins social network

(44)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

G LOBAL COMMUNITIES DETECTION

Problem

I Divid the set of nodes in a number of (overlapping) subsets such that induced subgraphs are dense and loosley coupled.

Recommended readings

I S. Fortunato. Community detection in graphs. Physics Reports, 2010, 486, 75-174

I L. Tang, H. Liu. Community Detection and Mining in Social Media, Morgan Claypool Publishers, 2010

I R. Kanawati, D´etection de communaut´es dans les grands graphes

d’interactions multiplexes : ´etat de l’art, (RNTI 2014), hal-00881668

(45)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

G LOBAL COMMUNITIES DETECTION

Problem

I Divid the set of nodes in a number of (overlapping) subsets such that induced subgraphs are dense and loosley coupled.

Recommended readings

I S. Fortunato. Community detection in graphs. Physics Reports, 2010, 486, 75-174

I L. Tang, H. Liu. Community Detection and Mining in Social Media, Morgan Claypool Publishers, 2010

I R. Kanawati, D´etection de communaut´es dans les grands graphes

(46)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

C OMMUNITIES DETECTION : M ETHODS

Classification

I Group-based

Nodes are grouped in function of shared topological features (ex. clique)

I Network-based

Clustering, Graph-cut, block-models, modularity optimization I Propagation-based

Unstable approaches, good when used with ensemble clustering I Seed-centred

from local community identification to global community detection

(47)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

G ROUP - BASED APPROCHES

Principle

Search for special (dense) subgraphs:

I k-clique I n-clique

I γ-dense clique

I K-core

(48)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

G ROUP - BASED APPROCHES

from Symeon Papadopoulos, Community Detection in

Social Media, CERTH-ITI, 22 June 2011

(49)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

E XAMPLE : C LIQUE PERCOLATION

Suits fairly dense graph.

(50)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

N ETWORK - BASED APPROCHES

Clustering approaches

I Apply classical clustering approaches using graph-based diastase function

I Different types of Graph-based distances: neighborhood-based, path-based (Random-walk)

I Usually requires the number of clusters to discover

(51)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

M ODULARITY OPTIMIZATION APPROACHES Modularity: a partition quality criteria

Q(P) = 1 2m

X

c∈P

X

i,j∈c

(A

ij

− d

i

d

j

2m ) (1)

Figure: Example : Q =

(15+6)−(11.25+2.56)

25

= 0.275

(52)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

M ODULARITY OPTIMIZATION APPROACHES

I Applying classical optimization algorithms (ex. Genetic algorithms [Piz12]).

I Applying hierarchical clustering and select the level with Q

max

(ex. Walktrap [PL06])

I Divisive approach : Girvan-Newman algorithm [GN02]

I Greedy optimization : Louvain algorithm [BGL08]

I . . .

(53)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

E XAMPLE : G IRVAN -N EWMAN A LGORITHM

1 Compute betweenness centrality for each edge.

2 Remove edge with highest score.

3 Re-compute all scores.

4 Repeat 2nd step.

Complexity : O(n

3

)

(54)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

E XAMPLE : L OUVAIN APPROACH

(55)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

M ODULARITY OPTIMIZATION LIMITATIONS

Hypothesis

The best partition of a graph is the one that maximize the modularity.

If a network has a community structure, then it is popsicle to find a precise partition with maximal modularity

If a network has a community structure, then partitions inducing high modularity values are structurally similar.

All three hypothesis do not hold [GdMC10, LF11].

(56)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

M ODULARITY RESOLUTION LIMITE

(57)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

M ODULARITY DISTRIBUTION

(58)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

P ROPAGATION - BASED APPROACHES

Algorithm 1 Label propagation

Require: G =< V, E > a connected graph,

1:

Initialize each node with unique label l

v 2:

while Labels are not stable do

3:

for v ∈ V do

l

v

=

l

l

(v)| /* random tie-breaking */

4:

end for

5:

end while

6:

return communities from labels

Γ

l

(v) : set of neighbors having label l

(59)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

L ABEL PROPAGATION

Advantages

I Complexity : O(m) I Highly parallel

Disadvantages

I No convergence guarantee, oscillation phenomena

I Low robustness

Different runs yields very different community

structure due to randomness

(60)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

L ABEL PROPAGATION : C ONVERGENCE ENHANCEMENT

I Asynchronous label update, [RAK07]

I Semi-synchronous label update (graph coloring + propagation by color) [CG12].

I Making stability even worse !

I Harden parallel implementations.

(61)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

R OBUST L ABEL PROPAGATION

I Label hop attenuation [LHLC09]

I Balanced label propagation [SB11].

I Multiplex approach !

adding new neighborhood similarity based relationships between adjacent nodes → multiplex network

I Ensemble clustering → Communities core [OGS10, SG12, LF12].

(62)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

P ARALLEL LP

(63)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

L ABEL PROPAGATION PREPROCESSING (EPP)

1 Apply an ensemble clustering on results of basic Parallel LP 2 Coarse the graph according to obtained communities

3 Apply a high quality community detection algorithm on coarsed graph.

4 Expand obtained results to the initial graph.

(64)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

E VALUATIONS

benchmark networks Pareto front

(65)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

S EED - CENTRIC ALGORITHMS

(KANAWATI, SCSM’2014)

Algorithm 2 General seed-centric community detection algorithm Require: G =< V, E > a connected graph,

1:

C ← ∅

2:

S ← compute seeds(G)

3:

for s ∈ S do

4:

C

s

compute local com(s,G)

5:

C ← C + C

s

6:

end for

7:

return compute community( C )

(66)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

S EED - CENTRIC APPROACHES

Table: Characteristics of major seed-centric algorithms

Algorithm Seed Nature Seed Number Seed selection Local Com. Com. computation Leaders-Followers [SZ10] Single Computed informed Agglomerative -

Top-Leaders [KCZ10] Single Input Random Expansion -

PapadopoulosKVS12 Subgraph Computed Informed Expansion -

WhangGD13 Single Computed informed Expansion -

BollobasR09 Subgraph Computed informed expansion -

Licod [Kan11] Set Computed Informed Agglomerative -

Yasca [?] Single Computed Informed Expansion Ensemble clustering

(67)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

E XEMPLE : L ICOD

(KANAWATI, SOCILACOMP’2011)

Licod : the idea !

1 Compute a set of seeds that are likely to be leaders in their communities

Heuristic : nodes having higher centralities than their neighbors 2 Each node in the graph ranks seeds in function of its own

preference

3 Each node modify its preference vector in function of neighbor’s preferences

4 iterate max times or till convergence.

(68)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

L ICOD : SOME RESULTS

Comparative results on benchmark networks : NMI

(69)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

T HE YASCA ALGORITHM

(KANAWATI, COCOON’2014)

The algorithm

Yet Another Seed-centric Community detection Algorithm

1 Compute a set of diverse seed nodes

2 For each node compute a bi-partition of the whole graphe based on local community identification.

3 Apply ensemble clustering to obtain a graph partition.

(70)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

Y ASCA : S OME RESULTS

Comparative Results on benchmark networks : NMI

Select 15% of high central nodes and 15% of low central nodes

(71)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

E VALUATION METHODS

I Topological criteria : is it useful for applications ? I Ground-truth comparaison : hard to find on large-scale

I Task-oriented approaches : Clustering , Recommendation, link

prediction, . . .

(72)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

T OPOLOGICAL EVALUATION CRITERIA

Quality of C = {S

1

, . . . , S

i

} :

Q(C) = P

i

f(S

i

)

|C| (2)

f () is a single- community quality metric 4 types of quality functions

Internal connectivity Externeal connectivity Hybrid functions

Model-based functions: the modularity

(73)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

I NTERNAL CONNECTIVITY FUNCTIONS

Internal density : f (S) =

n 2×mS

S×(nS−1)

Average Degree : f (S) =

2×mn S

S

FOMD: Fraction over Median Degree : f (s) =

|{u:uS,|(u,v),vn S|>dm}|

S

,

d

m

est le m´edian de degr´es des nœuds dans V

TPR: Triangle Participation Ratio :

|{uS}:∃v,wS:(u,v),(w,v),(u,w)∈E| nS

(74)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

E XTERNAL CONNECTIVITY FUNCTIONS

Expansion : f (S) =

CnS

S

Cut : f (S) =

n Cs

S×(N−nS)

(75)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

H YBRID FUNCTIONS

I Conductance : f(S) =

2mCS

S+CS

I MAX-ODF : Out Degree Fraction : f (S) = max

u∈S|{(u,v)∈E,v6∈S}|

d(u)

I AVG-ODF : f(S) =

n1

S

× P

u∈S

|{(u,v)∈E,v6∈S}|

d(u)

(76)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

G ROUND - TRUTH BASED EVALUATION

I Principle : Computing a similarity between obtained partition and a ground-truth one.

I Ground-truth : Expert defined.

Model-generated I Two types of metrics :

Agreement-based metrics: purity, rand, ARI

Information theory metrics: MI, NMI, . . . , etc.

(77)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

P URITY

I P = { p

1

, p

2

, . . . , p

k

} : a computed partition I R = { r

1

, r

2

, . . . , r

n

} : ground-truth partition I purity(P , R) =

|1

V|

P

k

j=1

max

i

(| p

k

∩ r

i

|)

(78)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

R AND INDEX

I P = {p

1

, p

2

, . . . , p

k

} : a computed partition I R = { r

1

, r

2

, . . . , r

n

} : ground-truth partition

I a : # pairs of nodes in a same community according to P and R I b : # pairs of nodes in a same community according to P and in

different communities in R

I c : # pairs of nodes in a different communities according to P and in same community in R

I d : # pairs of nodes in a different communities in P and R I

I rand(P , R) =

a+b+c+da+d

I ARI =

C2n(a+d)−[(a+b)(a+c)+(c+d)(b+d)]

(C2n)2−[(a+b)(a+c)+(c+d)(b+d)]

I E(ARI)=0

(79)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

M UTUAL INFORMATION

I(X, Y) = H(X) + H(Y) − H(X, Y) H(X) = P

nx

i=1

p(x

i

) × log(

p(x1

i)

)

Normalisation : NMI(U, V) = √

I(U,V)

H(U)H(V)

(80)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

C HALLENGES

I Real-world networks are Dynamic I Real-world networks are Heterogeneous I Nodes have usually semantic attributes I Scale problem

I Towards multiplex network mining

(81)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

M ULTIPLEX N ETWORK

Definition

A set of nodes related by different types of relations

Source: muxviz

Motivation

I Real networks are dynamic.

I Real networks are heterogeneous.

I Nodes are usually qualified by a set

of attributes.

(82)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

M ULTIPLEX N ETWORK : N OTATIONS G =< V, E

1

, . . . , E

α

: E

k

⊆ V × V ∀k ∈ {1, . . . , α} >

I V: set of nodes (a.k.a. vertices, actors, sites)

I E

k

: set of edges of type k (a.k.a. ties, links, bonds)

Notations

I A[k]Adjacency Matrix of slicek:a[k]ij 6=0 si les nœuds(vi,vj)∈Ek, 0 otherwise.

I m[k]=|Ek|. We have oftenm∼n

I Neighbor’s ofvin slicek:Γ(v)[k]={x∈V: (x,v)∈Ek}. I Node degree in slicek:dkv=kΓ(v)[k]k

I Total degree of nodei:dtot=Pα

d[s]

(83)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

C OMMUNITY ?

Some definitions

I A dense subgraph loosely coupled to other modules in the network I A community is a set of nodes seen as one by nodes outside the

community

I A subgraph where almost all nodes are linked to other nodes in the community.

What is a dense subgraph in a multiplex network ?

(84)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

C OMMUNITY DETECTION IN MULTIPLEX NETWORKS

Approaches

1 Transformation into a monoplex community detection problem

I Layer aggregation approaches.

I Ensemble clustering approaches

I Hypergraph transformation based approaches

2 Generalization of monoplex oriented algorithms to multiplex networks.

I Generalized-modularity optimization

I Seed-centric approaches

(85)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

L AYER AGGREGATION

Aggregation functions

Aij=

(1 ∃1≤l≤α:A[l]ij 6=0 0 otherwise

Aij=k {d:A[d]ij 6=0} k

Aij= 1 α

α

X

k=1

wkA[k]ij Aij=sim(vi,vj)

(86)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

L AYER AGGREGATION

Aggregation functions

Aij=

(1 ∃1≤l≤α:A[l]ij 6=0

0 otherwise Aij= 1

α

α

X

k=1

wkA[k]ij

(87)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

E NSEMBLE CLUSTERING APPROACHES

Ensemble Clustering Strehl2003

I CSPA: Cluster-based Similarity Partitioning Algorithm I HGPA: HyperGraph-Partitioning Algorithm

I MCLA: Meta-Clustering Algorithm

I . . .

(88)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

E NSEMBLE CLUSTERING APPROACHES

Ensemble Clustering Strehl2003

I CSPA: Cluster-based Similarity Partitioning Algorithm I HGPA: HyperGraph-Partitioning Algorithm

I MCLA: Meta-Clustering Algorithm

(89)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

K- UNIFORM HYPERGRAPH TRANSFORMATION KIVELA 2013 MULTILAYER

Principle

I A k-uniform hypergraph is a hypergraph in which the cardinality of each hyperedge is exactly k

I Mapping a multiplex to a 3-uniform hypergraph H = (V, E) such that :

V = V ∪ { 1, . . . , α}

(u, v, i) ∈ E if ∃ l : A

[l]uv

6= 0, u, v ∈ V, i ∈ { 1, . . . , α}

I Apply community detection approaches in Hypergraphs (Ex.

tensor factorization approaches)

(90)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

M ULTIPLEX M ODULARITY

Generalized modularity mucha2010community

I

Q

multiplex

(P) = 1 2µ

X

c∈P

X

i,j∈c k,l:1→α

 A

[s]ij

− λ

k

d

[k]i

d

[k]j

2m

[k]

 δ

kl

+ δ

ij

C

klij

I µ = P

j∈V k,l:1→α

m

[k]

+ C

ljk

I C

klij

Inter slice coupling = 0 ∀i 6= j

(91)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

M ODULARITY OPTIMIZATION LIMITATIONS

Hypothesis

I The best partition of a graph is the one that maximize the modularity.

I If a network has a community structure, then it is possible to find a precise partition with maximal modularity.

I If a network has a community structure, then partitions having high modularity values are structurally similar.

All three hypothesis do not hold Good10, LAN11a.

(92)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

M UX L ICOD

Multiplex degree centrality [?]

d

multiplexi

= −

α

X

k=1

d

[k]i

d

[tot]i

log d

[k]i

d

[tot]i

!

Multiplex shortest path

SP(u, v)

multiplex

=

α

P

k=1

SP(u, v)

[k]

α Multiplex neighborhood

Γ(v)

tot

∩ Γ(x)

tot

(93)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

E XPERIMENTS

Datasets

1 European airports network

2 Bibliographical network (DBLP) : Three layer network :

co-authorship, co-venue, co-citing.

(94)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

E VALUATION CRITERIA

1 Multiplex modularity 2 Redundancy [?]

ρ(c) = X

(u,v)∈P¯¯c

k {k : ∃A

[k]uv

6= 0} k α× k P

c

k

¯ ¯

P the set of couple (u, v) which are directly connected in at least

two layers

(95)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

R ESULTS : L AYER AGGREGATION APPROACHES

Figure: European airports multiplex network

(96)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

R ESULTS : P ARTITION AGGREGATION APPROACHES

Figure: European airports multiplex network

(97)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

R ESULTS : M UX L ICOD

Figure: European airports multiplex network

(98)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

R ESULTS : M UX L ICOD

Figure: DBLP 3-layer multiplex

(99)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

C ONCLUSIONS I

I Networks are everywhere (ou presque)

I Network analysis can be applied to different kind of tasks I Community detection can help in different analysis and

application tasks

I Challenges : Scale-problems, multiplex network mining

Problems : Evaluation and interoperation of computed

communities.

(100)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

B IBLIOGRAPHY I

J. P. Bagrow,Evaluating local community methods in networks, J. Stat. Mech.2008(2008), no. 5, P05001.

Vincent D Blondel, Jean-loup Guillaume, and Etienne Lefebvre,Fast unfolding of communities in large networks, Journal of Statistical Mechanics: Theory and Experiment2008(2008), P10008.

Gennaro Cordasco and Luisa Gargano,Label propagation algorithm: a semi-synchronous approach, IJSNM1(2012), no. 1, 3–26.

S. Carmi, S. Havlin, S. Kirkpatrick, Y. Shavitt, and E. Shir,A model of internet topology using k-shell decomposition, PNAS104 (2007), no. 27, 11150–11154.

Aaron Clauset,Finding local community structure in networks, Physical Review E (2005).

Jiyang Chen, Osmar R. Za¨ıane, and Randy Goebel,Local community identification in social networks, ASONAM, 2009, pp. 237–242.

(101)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

B IBLIOGRAPHY II

B. H. Good, Y.-A. de Montjoye, and A. Clauset.,The performance of modularity maximization in practical contexts., Physical ReviewE(2010), no. 81, 046106.

M. Girvan and M. E. J. Newman,Community structure in social and biological networks, PNAS99(2002), no. 12, 7821–7826.

Rushed Kanawati,Licod: Leaders identification for community detection in complex networks, SocialCom/PASSAT, 2011, pp. 577–582.

Reihaneh Rabbany Khorasgani, Jiyang Chen, and Osmar R Zaiane,Top leaders community detection approach in information networks, 4th SNA-KDD Workshop on Social Network Mining and Analysis (Washington D.C.), 2010.

Andrea Lancichinetti and Santo Fortunato,Limits of modularity maximization in community detection, CoRRabs/1107.1 (2011).

,Consensus clustering in complex networks, Sci. Rep.2(2012).

(102)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

B IBLIOGRAPHY III

Ian X.Y. Leung, Pan Hui, Pietro Lio, and Jon Crowcroft,Towards real-time community detection in large networks, Phys. Phy. E.

79(2009), no. 6, 066107.

Feng Luo, James Zijun Wang, and Eric Promislow,Exploring local community structures in large networks, Web Intelligence and Agent Systems6(2008), no. 4, 387–400.

Marina Meila,Comparing clusterings by the variation of information, COLT (Bernhard Sch ¨olkopf and Manfred K. Warmuth, eds.), Lecture Notes in Computer Science, vol. 2777, Springer, 2003, pp. 173–187.

Michael Ovelg ¨onne and Andreas Geyer-Schulz,Cluster cores and modularity maximization, ICDM Workshops, 2010, pp. 1204–1213.

Clara Pizzuti,A multiobjective genetic algorithm to find communities in complex networks, IEEE Trans. Evolutionary Computation16(2012), no. 3, 418–430.

(103)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

B IBLIOGRAPHY IV

Pascal Pons and Matthieu Latapy,Computing communities in large networks using random walks, J. Graph Algorithms Appl.

10(2006), no. 2, 191–218.

Usha N. Raghavan, Roka Albert, and Soundar Kumara,Near linear time algorithm to detect community structures in large-scale networks, Physical Review E76(2007), 1–12.

Lovro Subelj and Marko Bajec,Robust network community detection using balanced propagation, European Physics Journal B 81(2011), no. 3, 353–362.

Massoud Seifi and Jean-Loup Guillaume,Community cores in evolving networks, WWW (Companion Volume) (Alain Mille, Fabien L. Gandon, Jacques Misselis, Michael Rabinovich, and Steffen Staab, eds.), ACM, 2012, pp. 1173–1180.

D Shah and T Zaman,Community detection in networks: The leader-follower algorithm, Workshop on Networks Across Disciplines in Theory and Applications, NIPS, 2010.

(104)

INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion

B IBLIOGRAPHY V

Zied Yakoubi and Rushed Kanawati,Interactions in complex systems, ch. Community detection in complex interaction networks: Seed-based approaches, Cambridge Scholar Publishing, 2014.

Jaewon Yang and Jure Leskovec,Defining and evaluating network communities based on ground-truth, ICDM (Mohammed Javeed Zaki, Arno Siebes, Jeffrey Xu Yu, Bart Goethals, Geoffrey I. Webb, and Xindong Wu, eds.), IEEE Computer Society, 2012, pp. 745–754.

Références

Documents relatifs

To adapt the Louvain Method to optimize the global modu- larity, one must change two elements: the computation of the quality gain in the first phase and how to build the

In this paper we introduce an innovative scheme to extract communities of the characters involving in the same scene by only parsing the script of a film using an efficient

The first one, is the model by Girvan and Newman (GN) [4], which produces networks taking roughly the form of sets of small interconnected Erdős-Rényi networks [8].

The evaluation of community-detection algorithms is a complex problem that has received very little attention in the related literature. We can distinguish two main trends

We illustrate our approach with an experimental analysis of a real-world network with a ground-truth community structure that we compare with the output of eight different

For the final step of this part, in the same manner as the previously presented time com- putational analysis, we aggregate for all methods the estimates of average community size

In this article, we show how the Bitcoin transac- tion network can be studied using complex networks analysis techniques, and in particular how community detection can be

First, we consider how measure scores are affected by parameter q (transfor- mation intensity) when applying the so-called Singleton Cluster transformation, while other parameters