• Aucun résultat trouvé

Traceroute-like exploration of unknown networks: a statistical analysis

N/A
N/A
Protected

Academic year: 2022

Partager "Traceroute-like exploration of unknown networks: a statistical analysis"

Copied!
28
0
0

Texte intégral

(1)

Traceroute-like exploration of unknown networks: a statistical

analysis

A. Barrat, LPT, Université Paris-Sud, France

I. Alvarez-Hamelin (LPT, France) L. Dall’Asta (LPT, France)

A. Vázquez (Notre-Dame University, USA) A. Vespignani (LPT, France)

cond-mat/0406404

http://www.th.u-psud.fr/page_perso/Barrat

(2)

Plan of the talk

• Apparent complexity of Internet’s structure

• Problem of sampling biases

• Model for traceroute

• Theoretical approach of the traceroute mapping process

• Numerical results

• Conclusions

(3)

Multi-probe reconstruction (router-level)

Use of BGP tables for the Autonomous System level (domains)

Graph representation

different granularities

Internet representation

Many projects (CAIDA, NLANR, RIPE, IPM, PingER...)

(4)

=>Large-scale visualizations

TOPOLOGICAL ANALYSIS

by STATISTICAL TOOLS

and GRAPH THEORY !

(5)

Main topological characteristics

• Small world

Distribution of Shortest paths (# hops)

between two nodes

(6)

Main topological characteristics

• Small world : captured by Erdös-Renyi graphs

Poisson distribution

<k> = p N

With probability p an edge is established among couple of vertices

(7)

Main topological characteristics

• Small world

• Large clustering: different neighbours of a node will likely know each other

1 2

3 n

Higher probability to be connected

=>graph models with large clustering, e.g. Watts-Strogatz 1998

(8)

Main topological characteristics

• Small world

• Large clustering

• Dynamical network

• Broad connectivity distributions

• also observed in many other contexts

(from biological to social networks)

• huge activity of modeling

(9)

Main topological characteristics

Broad connectivity distributions:

obtained from mapping projects

Govindan et al 2002

Result of a sampling:

is this reliable ?

(10)

Sampling biases

Internet mapping:

Sampling is incomplete

Lateral connectivity is missed (edges are underestimated) Finite size sample

=> spanning tree

(11)

Sampling biases

Vertices and edges best sampled in the proximity of sources

Bad estimation of some topological properties

Statistical properties of the sampled graph may sharply differ from the original one

Lakhina et al. 2002 Clauset & Moore 2004

De Los Rios & Petermann 2004 Guillaume & Latapy 2004

Bad sampling

?

(12)

Evaluating sampling biases

Real graph G=(V,E)

(known, with given properties )

Sampled graph G’=(V’,E’)

simulated sampling

Analysis of G’, comparison with G

(13)

Model for traceroute

G=(V, E)

Sources Targets

First approximation: union of shortest paths

NB: Unique Shortest Path

(14)

Model for traceroute

G’=(V’, E’)

First approximation: union of shortest paths

Very simple model, but: allows for some analytical

and numerical understanding

(15)

More formally...

G = (V, E): sparse undirected graph with a set of N

S

sources S = { i

1

, i

2

, …, i

NS

}

N

T

targets T = {j

1

, j

2

, …, j

NT

}

randomly placed.

The sampled graph G’=(V’,E’) is obtained by considering the union of all the traceroute -like paths connecting source- target pairs.

PARAMETERS: ,

N NS

S

,

N NT

T

N

N NS T

(probing effort)

Usually N

S

=O(1), 

T

=O(1)

(16)

For each set  = { S , T }, the indicator function that a given edge (i, j) belongs to the sampled graph is

  

 

 

 

m l

N s

m l ij N

t

mj li

ij

S T

t

1 s

) , ( 1

1

1   

with

 

  0

)

1

, (l m

ij if (i,j) path between l,m

otherwise, S

N s

ii

N

S

s

1

N T

t

ij

N

T

t

1

Analysis of the mapping process

(17)

Averaging over all the possible realizations of the set  = { S,T },

  

 

 

 

m l

m l ij T S m

l

N s

N t

m l ij mj li

ij

S T

t

s 1 (1 )

1

1 ( , )

1 1

) ,

(

  

WE HAVE NEGLECTED CORRELATIONS !!

• usually ST << 1

V j i m l

m l ij V USP

t j i

s (s,t)

t) (s, ij

ij σ

b σ ( , )

Mean-field statistical analysis

 ij > 1 – exp(-  S T b ij )

betweenness

(18)

Betweenness centrality b

for each pair of nodes (l,m) in the graph, there are

lm

shortest paths between l and m

ijlm

shortest paths going through ij

b

ij

is the sum of 

ijlm

/ 

lm

over all pairs (l,m)

Similar concept:

node betweenness b

i

i

j

k ij: large betweenness jk: small betweenness

Also: flow of information if each individual

sends a message to all other individuals

(19)

Consequences of the analysis

 ij > 1 – exp(-  S T b ij )

•Smallest betweenness

b

ij

=2 => <

ij

> 2 

S

T

i j

•Largest betweenness

b

ij

=O(N

2

) => <

ij

> 1

(i.e. i and j have to be source and target to discover i-j)

(20)

Results for the vertices

 ij > 1 – exp(-  b ij /N)

<

i

> 1 – (1-

T

) exp( -  b

i

/N ) (discovery probability)

N

*k

/N

k

 1 – exp( -  b(k)/N ) (discovery frequency)

<k

*

>/k   ( 1 + b(k)/N )/k (discovered connectivity)

Summary:

discovery probability strongly related with the centrality ;

• vertex discovery favored by a finite density of targets ;

• accuracy increased by increasing probing effort  .

(21)

Numerical simulations

1. Homogeneous graphs:

peaked distributions of k and b

• narrow range of betweenness

1

,

1

max

 b b

e

(ex: ER random graphs)

=> Good sampling expected only for high probing effort

(22)

Numerical simulations

• broad distributions of k and b P(k) ~ k-3

• large range of available values

1

k 

1. Homogeneous graphs 2. Heavy-tailed graphs

(ex: Scale-free BA model)

=> Expected: Hubs well-sampled independently of 

(b(k) ~k



(23)

Homogeneous graphs

Homogeneously pretty badly sampled

Numerical simulations

(24)

Hubs are well discovered

Numerical simulations

Scale-free graphs

(25)

No heavy-tail behaviour except....

• heavy-tailed P*(k) only for NS = 1 (cf Clauset and Moore 2004)

cut-off around <k> => large, unrealistic <k> needed

bad sampling of P(k)

Numerical simulations

Homogeneous graphs

N

S

=1

(26)

good sampling, especially of the heavy-tail;

• almost independent of NS ;

• slight bending for low degree (less central) nodes => bad evaluation of the exponents.

Scale-free graphs

Numerical simulations

(27)

Summary

 Analytical approach to a traceroute-like sampling process

 Link with the topological properties, in particular the betweenness

 Usual random graphs more “difficult” to sample than heavy-tails

 Heavy-tails well sampled

 Bias yielding a sampled scale-free network from a homogeneous network:

only in few cases

<k> has to be unrealistically large

Heavy tails properties are a genuine feature of the Internet

however

Quantitative analysis might be strongly biased (wrong exponents...)

(28)

Optimized strategies:

•separate influence of 

T

, 

S

location of sources, targets

investigation of other networks

Results on redundancy issues

Massive deployment traceroute@home

Perspectives

The internet is a weighted networks

bandwidth, traffic, efficiency, routers capacity

Data are scarse and on limited scale

Interaction among topology and traffic

and...

cf also Guillaume and Latapy 2004

Références

Documents relatifs

In a discrimination context, we first define the topological equivalence between the chosen proximity measure and the perfect discrimination measure adapted to the data

Its uses the concept of neighborhood graphs and compares a set of proximity measures for continuous data which can be more-or-less equivalent a

In section 2, after recalling the basic notions of structure, graph and topological equivalence, we present the pro- posed method, how to build an adjacency matrix associated with

Further, the underlying perspective of this line of research differs from Inte- grated Information Theory (IIT) by actually hewing closely to the asymptotic limit theorems of

In Chapter 2 we have developed and implemented a new statistical algorithm for quantitative determination of the binding free energy of two heteropolymer se- quences under

In this context, we first designed a new soft clustering algorithm for the mode seeking framework using such random walks and we provided a way to bound the 2-Wasserstein

We also show that for a complex rational map the existence of such invariant measure characterizes the Topological Collet–Eckmann condition: a rational map satisfies the

As the founding that the elastic properties of the pure phases (C 3 S, C 2 S) and their solid solution alite and belite found in industrial clinker has no obvious difference [4].