The emergence of giant components - Algorithmic Foundations of the Internet

In their original work on random graphs, growing with successive attach-ments of arcs to a given set of nodes, Erd˝os and R´enyi discovered an interesting phenomenon. Their graphs showed a sudden change (today called a “phase transition”) in node connectedness for increasing values of the mean node

de-gree ¯d. In the initial steps of the growing process, that is when there are just a few arcs in the graph and the value of ¯d is small, most nodes are discon-nected from one another and the graph actually consists of a great number of independent connected components. Increasing the number of arcs, these components start merging to form bigger components; until, when the value d¯= 1 is reached, a very large connected component emerges whose nodes form a sizable subset of the total.

More precisely, if we repeat the process for an increasing number of nodes n, all initial components have a size s with _n^s → 0 for n → ∞. Around d¯ = 1, however, a giant (connected) component appears whose size s is a finite fraction ofn, i.e., _n^s is a positive constant for increasingn. The graph is then composed of one giant component plus a collection of components of negligible size. By adding new edges, the latter components tend to merge with the giant one, until the whole graph is connected. With a little injustice we will essentially direct our attention to the giant component only. For example the mean length ¯l of the shortest path between all pairs of nodes implicitly refers to the nodes of the giant component. Note that ¯d= 1 implies that each node has on average just one incident arc, so studying graphs for ¯d ≥ 1 is reasonable on practical grounds.

Erd˝os and R´enyi’s model was an abstract one, and the birth of a giant component was demonstrated in mathematical terms. In fact this phenomenon occurs in most real networks, although we may not be able to predict the values of the parameters involved in the phase transition. For example, if the nodes of a network are the freshmen of a college, and an arc is formed whenever a new acquaintance arises, the corresponding graph is initially formed of many small components, but at the end of the academic year the graph is dense and a giant component is certainly present, leaving out only small groups of shy people, or of fellows with narrow common interests. Unlike in the Erd˝os and R´enyi’s model this process is essentially non random since acquaintances may be due to personal attraction, or to interest or taste sharing. Still one (and only one) giant component will arise.

So far so good. But the phenomenon becomes much more complex if the graph is directed. A much larger number of arcs is needed to form directed paths between all the nodes of a component and the global structure of the connections becomes more interesting. For a moment, take all the directed arcs as undirected. The giant component so observed is said to beweakly connected, and is called GW for giant weak, to indicate that many connections would disappear on restoring the arc orientation. As before, the graph is divided into GW plus several independent small components, and again we direct our attention to GW.

Restoring the arc orientation, GW takes the form indicated inFigure 7.2.

If the number of edges is large enough, the three subsets GS, GI, and GO have a size proportional ton, hence they are properly called giant. The core GS, forgiant strong, is the largest connected subset of GW. Each node in GS can be reached from any other node, that is GS is a connected component

SCC

Tube

GS GO

Tendril

GW

FIGURE 7.2: The structure of the weak giant component GW in a directed graph. The core GS is the largest connected component of the whole graph.

of the directed graph. GI, for giant in, contains all the nodes from which a node in GS can be reached, but non vice-versa (otherwise the node would be in GS). Conversely GO, for giant out, contains all the nodes that can be reached by the nodes in GS, but non vice-versa. The remaining nodes of GW are clustered intotendrils andtubes. Tendril nodes can be reached from GI, but cannot reach nodes outside the tendril; or can reach GO, but cannot be reached from outside the tendril. Tube nodes form a bridge connecting GI and GO without passing through GS.

As we shall see, the structure of Figure 7.2 is found in the World Wide Web represented as a directed graph. Taking our college freshmen again, we may construct a directed graph of telephone calls where arcs are directed from the caller to the receiver. The fellows in GS make and receive many calls. Those in GI try to be friendly without great success. Those in GO are very much sought for but tend to keep to themselves. The reason why this structure is found in the Web is of course very different.

Like random graphs, most non random networks tend to suddenly form one giant component when the number of arcs reaches a certain limit, unless special conditions occur.² This is remarkable since most of the other charac-teristics of the network, for example the degree distribution seen in Chapter 6, are strongly influenced by the formation process.

In a random graph all possible arcs have the same probability of occurring independently of the arcs already present, while in most real-life networks, if two nodesx, y are connected to another node z, the probability thatx and

2For example the freshmen of two colleges located in different states, if taken as a whole, tend to generate two giant components of half size, one for each college. However both components have a size proportional to the total number of nodes, hence are giant.

y are also connected by an arc is generally much higher than casting arcs at random (“a friend of my friend is also my friend”). There is a vast literature on this clustering phenomenon, or in general on the local density of a network given its global density. In an undirected graph a node x of degreed has d neighbors that may share up tos=d(d−1)/2 arcs. Ifcarcs actually link the neighbors ofx, the standardclustering coefficientofxis defined as:

C=c/s= 2c/d(d−1), (7.1)

and the mean value ¯C of C can be computed as a parameter of the whole network. For directed graphs, relation (7.1) can be immediately extended.

For a wealth of social networks, biological networks, neural networks, dis-tribution networks, and for communication networks like the Internet and the Web, the value of ¯C has been shown to be much higher than that of a ran-dom network with the same number of nodes and arcs. In all these cases, two neighbors of a given node have a rather high probability of being neighbors of one another, thereby keeping the value of ¯C high, since this parameter indi-cates the density of loops of length three. More generally two nodes at a short distance from a given node may be neighbors of one another thus contribut-ing to local clustercontribut-ing, as happens for example in the graph of Figure 7.1.

Global clustering can be measured with a proper extension of relation (7.1) to indicate the density of loops of any length.

In a random graph, however, each node has probability d/(n−1) that one of itsdincident arcs is in fact connected to one of the other n−1 nodes.

Recalling that for increasingnthe value ofdmust be kept constant in practice (otherwise the “size” of the nodes would be unbounded), the above probability is asymptotically vanishing and the value of c in relation (7.1) goes to zero.

Then ¯C→0 forn→ ∞. Random networks are practically free of clustering, that is they tend to have a tree-like structure in the surrounding of each node and loops are long. The giant connected component present in all networks has an internal structure that depends on the growing process. Other interesting properties will be found in the next section.

Dans le document Algorithmic Foundations of the Internet (Page 132-135)