Traffic-driven model of the World-Wide-Web Graph
A. Barrat, LPT, Orsay, France M. Barthélemy, CEA, France
A. Vespignani, LPT, Orsay, France
Outline
The WebGraph
Some empirical characteristics
Various models
Weights and strengths
Our model:
Definition
Analysis: analytics+numerics
Conclusions
The Web as a directed graph
i
l j
nodes i: web-pagesdirected links: hyperlinks
in- and out- degrees:
• Small world : captured by Erdös-Renyi graphs
Poisson distribution
<k> = p N
With probability p an edge is established among couple of vertices
Empirical facts
• Small world
• Large clustering:
different neighbours of a node will likely know each other1 2
3 n
Higher probability to be connected
=>graph models with large clustering, e.g. Watts-Strogatz 1998
Empirical facts
• Small world
• Large clustering
• Dynamical network
• Broad connectivity distributions
• also observed in many other contexts
(from biological to social networks)
• huge activity of modeling
Empirical facts
(Barabasi-Albert 1999; Broder et al. 2000; Kumar et al. 2000;
Adamic-Huberman 2001; Laura et al. 2003)
Various growing networks models
Barabáási-Albert (1999): preferential attachment
Many variations on the BA model: rewiring (Tadic 2001, Krapivsky et al. 2001), addition of edges,
directed model (Dorogovtsev-Mendes 2000, Cooper- Frieze 2001), fitness (Bianconi-Barabáási 2001), ...
Kumar et al. (2000): copying mechanism
Pandurangan et al. (2002): PageRank+pref.
attachment
Laura et al. (2002): Multi-layer model
Menczer (2002): textual content of web-pages
The Web as a directed graph
i
l j
nodes i: web-pagesdirected links: hyperlinks
Broad P(kin) ; cut-of for P(kout)
(Broder et al. 2000; Kumar et al. 2000;
Adamic-Huberman 2001; Laura et al. 2003)
Additional level of complexity:
Weights and Strengths
i
j
Links carry weights/traffic:
w
ijIn- and out- strengths
l
Adamic-Huberman 2001: broad distribution of sin
Model: directed network
n i
j
(i) Growth(ii) Strength driven
preferential attachment (n: kout=m outlinks)
AND...
“Busy gets busier”
Weights reinforcement mechanism
i
j n
The new traffic n-i increases the traffic i-j
“Busy gets busier”
Evolution equations
(Continuous approximation)
Coupling term
Resolution
Ansatz
supported by numerics:
Results
Approximation
Total in-weight i sini : approximately proportional to the
total number of in-links i kini , times average weight hwi = 1+
Then: A=1+
sin 2 [2;2+1/m]
Measure of A prediction of
Numerical simulations
Approx of
Numerical simulations
NB: broad P(sout) even if kout=m
Clustering spectrum
i.e.: fraction of connected couples of neighbours of node i
Clustering spectrum
• increases => clustering increases
• New pages: point to various well-known pages, often connected together => large clustering for small nodes
• Old, popular pages with large k: many in-links from many less popular pages which are not connected together
=> smaller clustering for large nodes
Clustering and weighted clustering
takes into account the relevance of triangles in the global traffic
Clustering and weighted clustering
Weighted Clustering larger than topological clustering:
triangles carry a large part of the traffic
Assortativity
Average connectivity of nearest neighbours of i
Assortativity
•knn: disassortative behaviour, as usual in growing networks models, and typical in technological networks
•lack of correlations in popularity as measured by the in-degree
Summary
Web: heterogeneous topology and traffic
Mechanism taking into account interplay between topology and traffic
Simple mechanism=>complex behaviour, scale-free distributions for connectivity and traffic
Analytical study possible
Study of correlations: non-trivial hierarchical behaviour
Possibility to add features (fitnesses, rewiring, addition of edges, etc...), to modify the redistribution rule...
Empirical studies of traffic and correlations?