The perception of small worlds - Algorithmic Foundations of the Internet

Manzoni’s secret sharing suggests that all of the approximately ten billion people that form human society would be reached by a secret in a few steps, except possibly for small groups of asocial individuals that remain outside the giant component. Apparently this is not that far from the truth, as the network of personal acquaintances between humans seems to have a giant connected component where the mean distance between any pair of nodes is

1 2

N

N n

n

FIGURE 7.3: Two non small-world graphs: a ring and a two-dimensional grid. A dashed “long-range contact” has been added to the latter as in Klein-berg’s model.

quite small. This is what is known as the small-world effect in the field of social networks.

These studies started in the 1960s with a set of experiments conducted by the psychologist Stanley Milgram at Harvard University. The most relevant experiment for the field of networks is so well known that we will recall it only very briefly. Milgram asked a random group of people in Nebraska to send a letter to a specific person in Boston passing through a chain of people that at each step the sender personally knew. Many letters got lost, but those that reached the Bostonian addressee had made an average of six hops. The expression “six degrees of separation” then passed into popular language, to indicate that six (or, in any case, a very small number of) steps of acquaintance separate any two of us: from Mahmud Ahmadinejad to Angelina Jolie; from Angela Merkel to Tiger Woods.

From a mathematical point of view the small-world effect arises when the mean distance ¯l between nodes is at most polynomial in log(N), where N is the number of nodes. Not “six” as found by Milgram; not even a higher constant, that would be demanding too much as N goes to infinity; but a big reduction anyway. Note that we are referring to ¯l, that is, to the mean length of the shortest path between any two nodes, and not to thediameterδ of the graph defined as the length of thelongestshortest path (i.e., the path connecting two vertices at maximal distance).³

In Chapter 6 we have seen that networks growing randomly or with pref-erential linking are small worlds. Of course this result comes from a math-ematical abstraction, but all social or communication networks examined in the last decades experimentally exhibit the same property. So, to start with,

3There is some confusion in the literature on using ¯lorδto define small worlds. This may be due to the fact that in a random graph both parameters are logarithmic inN. We refer to ¯l. As ¯lis a mean value, it may well be that the actual distance between Angela Merkel and Tiger Woods is much more than six.

which graphsare notsmall worlds? Typical examples in computing arerings of processors used in small local area networks, and regular grids of CPUs used in parallel processing: seeFigure 7.3. In a ring ofN nodes we can easily find:

δ=bN/2c, ¯l∼N/4. (7.2)

In a two-dimensional grid ofN =n1×n2nodes we have:

δ=n1+n2−2, ¯l∼(n1+n2)/2, (7.3)

both proportional to the square root ofN ifn₁ andn₂ are of the same order of magnitude. In general in a d-dimensional grid both δ and ¯l are of order O(N^1/d). On the other side of this spectrum stand complete graphs with δ= ¯l= 1.

Comparing random graphs and grids we see that two of their characteristics are opposite. In random graphs the value of ¯l is very low and the loops are long (low clustering), while in grids the value of ¯l is very high and the loops are short (high clustering). In 1998 Watts and Strogatz suggested how to reconcile the two models into one that exhibits low ¯l and high clustering. For example starting from a highly regular and highly clustered network like a grid, add a small number of arcs chosen uniformly at random. The clustering remains, however, a few new random arcs are sufficient to reduce the value of ¯l dramatically, and tuning the parameters this value becomes logarithmic.

This process creates a small world with high clustering. A standard example is a set of tightly connected communities (e.g., freshmen of different colleges), with the communities initially disconnected from one another. A few random connections between nodes of different communities are sufficient to link any pair of nodes with just a few hops, part on the new arcs and part inside the communities.

Speculating on Milgram’s experiment and on Watts and Storgatz models, Jon Kleinberg discovered some interesting properties of decentralized algo-rithms for the detection of short paths in a network. These algorithms only make use of some local information stored at each node, as in fact was attained by the participants in Milgram’s experiment. The problem is put in strongly mathematical terms, focusing on a generalization of ann×n grid model on which three parameters p, q, r are imposed. Each node u now has a direct connection with every other nodev within a distancep(i.e.,pis the number of grid steps betweenuandv). The nodes in this neighborhood are thelocal contacts of u. In particular for p = 1 the local contacts are the four nodes surroundinguin the grid. In additionuhasqlong-range contactsdetermined byq independent random trials. The probability that uchooses v as one of these contacts is proportional tod(u, v)^−r, whered(u, v) is the number of grid steps between u and v. In particular for q = 1 there is only one long-range contact per node (see Figure 7.3, lettingn1=n2=n).

Kleinberg proved two interesting facts about this model. For p=q = 1,

i.e., if each nodeuhas only four local contacts and one long-range contact; and forr= 2, i.e., the probability of connectinguwith the long-range contactv is proportional tod(u, v)⁻²; one can apply a very simple decentralized routing algorithm. To send a message to a target node t, the original source node u, and then any successive message holder, sends the message to its contact that is closest tot. Kleinberg proved that this algorithm requires an expected number of hops of order O((logn)²), that is, a delivery time exponentially smaller than the numberN =n²of grid nodes. Not only does the grid become a small world with the addition of one long-range contact per node, but also each node is able to send the message along a short path to a given target solely on the basis of limited local knowledge.

The second result is also impressive. Forr6= 2, any decentralized routing algorithm in the family of grids requires a delivery time polynomial in n instead of log(n): i.e., it is exponentially slower than the algorithm above.

For example for r = 0 the long-range connections are chosen uniformly at random, since the probability of being chosen has the same value 1/Nlfor all the possibleNllong-range contacts. Essentially this is the model of Watts and Storgatz. Short connecting chains also exist in this case, but no decentralized algorithm can detect them efficiently.

In conclusion a combination of randomness and local clustering are two basic ingredients to build a small world close to what is observed in practice.

Dans le document Algorithmic Foundations of the Internet (Page 135-138)