Power laws: the rich get richer - Algorithmic Foundations of the Internet

Although randomness plays an important role in the growth of real life networks including the Internet and the Web, a completely different growth model may be even more relevant. As we will see now, this phenomenon was originally observed in the field of economics, and only later made its way into other sciences.

The story starts at the end of the nineteenth century with Vilfredo Pareto and his theories on political economy.⁹In 1896 Pareto presented the “curve of income” based on an accurate statistical survey on the income distribution in different European countries, particularly England and Prussia where avail-able data were more reliavail-able. This curve, drawn in Figure 6.4(a), shows the numbery of persons with income at leastx. The best mathematical fit for it was given by Pareto in the form:

y=b(x+a)^−γ (6.13)

where a, b, and γ are positive constants. With the terminology used today equation (6.13) is apower law sincey is expressed as a power ofxwith the addition of some constant terms. Studying networks we will find power laws quite often. In all cases the exponent is negative hence the curve is convex andy decreases for increasingx.

One could have expected that the constantsa, b, γdepended on the country or on the year in which the phenomenon was observed, but Pareto’s statistics showed that their values were quite similar in all countries for which data were available, and remained almost unchanged in a time window of over forty years. In particular the value ofawas always extremely small, and the value ofγspanned from 1.89 to 1.60. As Pareto always stressed, the basic “shape”

of the curve (in fact, the power law form) characterizes the distribution of wealth everywhere, even if some changes in the constants can induce minor deformations. Before explaining what all this has to do with the Internet, let us examine the meaning and the consequences of such a mathematical behavior.

At a first glance the curve shows that a very large numberyof persons have a very low income at mostxand only a few persons have a very large income.

This fact can be characterized with an elementary mathematical analysis,

us-9The son of an Italian political refugee, Vilfredo Pareto grew up in Paris. His family was eventually allowed to return to Italy were he graduated in engineering and then reached a very high position as an executive in industry, although he was openly a socialist and culturally an anarchist. Later he left industry to become a professor of Political Economy in the Swiss University of Lausanne where he wrote his famousCours d’´economie politique (Course of Political Economy) in which economics was approached on quantitative grounds, a novelty for those times. His political beliefs were so inflexible that he even refused a life seat in the Italian Senate.

FIGURE 6.4: (a) Pareto’s income distribution. The curve starts at a high value on theyaxis sinceais very small. (b) Power law of node degree distri-bution for a typical network growth.

ing once again a continuous function (6.13) to describe a discrete phenomenon (number of persons versus amount of income) in its general terms. Confining the curve into a finite window, say 1≤x≤M, to represent an actual span of income; and lettingE be the mean value of the income over that span; it can be proved that the persons earning less thanE are many more than one half of the total population independently of the values of aand b. That, is the poor are a vast majority.

In terms of political economics this fact can be simply explained. People with a very low income can barely survive and tend to stay in their condi-tion forever. If their personal income increases there is some room for savings, and further income comes from the “interest” on the capital, whatever inter-est means. As the income becomes even larger people start acquiring further sources of revenue and become more and more wealthy. Following a popular saying: “the rich get richer.” Economic data are presented today in much more sophisticated forms compared to Pareto’s curve, but a basic fact still holds.

From the poorest to the richest countries, a small percent of the population owns a large percentage of the total wealth.

In the decades that followed Pareto’s studies, other power laws were ap-plied to the mathematical description of different phenomena, many of which related to human affairs. Starting from the 1930s a significant role in empir-ical statistics was played by studies on the frequency of the words in natural language utterances, mainly due to the Harvard linguist George Kingsley Zipf.

The originalZipf ’s lawstated that, if the words of a sufficiently rich linguis-tic corpus are ordered according to their decreasing frequencies, the following relation (approximately) holds:

f(w_i)∝i⁻¹

wherewi is the word in thei-th position (rank) of the ordered list, andf(wi) is its frequency. That is, the frequency of any wordwiis inversely proportional

to its rank. Taking English as an example, the Zipf’s law states that the most frequent word “the” occurs approximately twice as often as the second word

“of,” three times as often as the third word “and,” etc., as is in fact observed in relevant collections of English sentences. The law was then extended by raising the rankito a distribution exponent different from -1, and examining different fields in which it may hold.

Note that power laws are not the only mathematical expressions for which a small portion of a “population” accounts for a high share of the total “wealth,”

whatever the terms in quotes represent. A similar effect occurs if the wealth is distributed according to an exponential law where the independent variable xis at exponent (with minus sign), instead of a power law wherexis in the base; however, there is a crucial difference that will be explained below.

It is now time to discuss how power laws appear in graphs. In fact, although there has been an excessive pursuit to discover power law distributions in all sorts of data collections, sometimes exaggerating their validity to unduly support different claims, such distributions unquestionably play a central role in the growth of networks. Consider the following growth process where, at any step, the nodes with the highest degree have a better chance to increase their degree (i.e., the rich get richer principle applies). Referring again to an undirected graph we pose:

Preferential attaching process 3 1. start with an initial node 1;

2. proceed with consecutive steps: at each step i insert a new node i and a new arc connecting i to an existing nodex chosen with probability p proportional to the current degree ofx(i.e., p∝d(x, i)).

This process, proposed by Barab´asi and Albert in 1999, is one of the bases of the whole theory of network growth, and gives rise to a so called “cita-tion graph.” Standard mathematical analysis shows that the func“cita-tion P(d) emerging from process 3 has the continuous power law form:

P(d)∝d⁻³. (6.14)

Compared to Pareto’s curve ofFigure 6.4(a)the function (6.14) goes to∞for dgoing to zero. However the new curve has a meaning only ford≥1 because by construction all the vertices have at least one incident arc. Furthermore the exponent −3 in relation (6.14) is larger in absolute value than the exponent

−γ of (6.13) so the new curve is closer to thexaxis.

A little caution is in order here. Equation (6.13) gives the number of people with income≥x(not exactly x), while equation (6.14) gives the probability of finding a vertex with degree equal tod. We can easily transform the latter equation to give acumulativedistributionP_c(d), i.e., the probability of finding a vertex of degree≥d. In factP_c(d) is simply the integral ofP(d) fromdto∞, hence we haveP_c(d)∝d⁻². In the next chapter, dealing with the Internet and

the Web, we will see that cumulative distributions are often more significant than the others.

Many variations of process 3 have been proposed in order to make it more suitable for modelling real networks. Major extensions allow several arcs to enter at each step, although their number must be kept constant along the process to permit a reasonable mathematical analysis. Moreover these arcs may be connected to the nodes according to a mixture of preferential and random attaching, as in fact happens in the Internet and the Web. Finally there is no need that a new node xis immediately connected to the others, although this requires a little caution (see below).

All these models must be studied in the specialized literature. We indicate only a very simple extension of process 3 that gives a significant account of what can be expected from the others. Namely:

Preferential and random process 4 1. start with an initial node 1;

2. proceed with consecutive steps: at each step i insert a new node i and a new arc connecting two existing nodes x, y chosen with probabilities p_x∝d(x, i) +aandp_y∝d(y, i) +a, with a >0constant.

Process 4 introduces a mixture of preferential and random attachment, the latter through the additive constanta. The greater ais, the less preferential is the process. Although possibly very small,a cannot be zero because each new nodei enters the graph with degree zero and, fora= 0, could never be attached to the others. Asymptotic analysis shows that for large values of d the functionP(d) emerging from process 4 has the form:

P(d)∝d^−γ (6.15)

withγ = (2 +a/2). To stay close to what happens in many real networks a must be chosen in the interval (0−2] soγhas a value in (2−3].

All the common variations of process 4 end up with a power law for P(d) whose exponent and other constants are a function of the various parameters of the process such as the number of arcs introduced at each step, or their distinction between preferential and random arcs, or even the number of arcs that are taken off the graph at certain steps. All these paramenters are re-flected in the graphical representation of the function, which in all cases has the shape ofFigure 6.4(b). In a later chapter we shall see that several other features of the Internet and the Web, and of other real networks, are governed by power laws with most exponents between−3 and−2.

We can also consider the value of ¯d(x, i) (mean degree of node xat step i) as we did for the exponential distribution. For process 4 and its variations we get in general something like:

d(x, i)¯ ∼(i/x)^β, (6.16)

FIGURE 6.5: Asymptotic comparison between random attachment (expo-nential law in dashed line) and preferential attachment (power law in solid line).

withβ <1. That is, the mean node degree increases with the ratio ofi over xas expected, and increases much faster than in the exponential distribution (relation (6.11)) where the variation is limited to a logarithmic growth.

Even for process 3 and for its extensions, the mean distance between nodes is small, generally scaling withnas:

¯l∼logn/log logn. (6.17)

We have now the basic information required to compare an exponential distribution with a power law distribution. Figure 6.5 shows the two functions:

y=a e^−bx, y=a x^−γ (6.18)

wherea, b, γ are constants.

The points where the two curves intersect thex, y axes, or intersect each other, depend on the values of a, b, γ, but the general shape of the curves is independent of these parameters. The exponential law may lay above the power law in an intermediate interval of the xcoordinate, but stays always below the power law for small and large values ofx. Furthermore forx→ ∞ the exponential law goes to zero (by definition) with exponential decay, while the power law goes to zero much more slowly. The value of the mean degree of a node as a function of the instant when the node enters the game, indicated in relations (6.11) and (6.16) for the two distributions, confirm the abundance of nodes with high degree in the latter case. Power laws are said to have a

“fat tail.”

An interesting study of the two functions (6.18) can be performed if we take the natural logarithm of both sides of the equality, thus obtaining:¹⁰

ln y=ln a−b x, ln y=ln a−γ ln x. (6.19)

10Recall thatln(a·b) =ln a+ln b;ln a^b=b ln a; andln e= 1.

FIGURE 6.6: Exponential behavior (dashed) versus power law (solid), in semi-logarithmic scale (a), and in logarithmic scale (b). The intersections are:

A=ln a,B =a e^−γ,C=ln a/b,D=ln a−1,E=ln a/γ,F=ln ln a−ln, b.

Now plot the two functions on the new axes x, ln y (semi-logarithmic scale:

Figure 6.6(a)), and ln x, ln y (logarithmic scale: Figure 6.6(b)). In the first plane the exponential function corresponds to a straight line with slope−b.

In the second plane the power law corresponds to a straight line with slope−γ.

As noted in footnote 8 the shape of these curves is essentially independent of the base chosen for the logarithms. In practical applications base 10 is mostly used.

The shape of the curves in Figure 6.6 is very important for evaluating experimental results. In fact data coming from a random attachment exper-iment, or from a preferential attachment experexper-iment, tend to cluster along a straight line respectively in a plane x,logy, or in a plane logx,logy. So a mere examination of the data distribution on the plane may reveal, at least approximately, the nature of the experiment.

Those who have reached the end of this rather heavy chapter may look at what follows with some relief. Reading the remaining chapters is going to be a smoother and more pleasant ride.

Bibliographic notes

Graph theory is the subject of countless books. To mention just one classic, still widely available: Berge, C. 1962.The theory of graphs and its applications.

John Wiley & Sons, New York. The field of networks is more recent, and it is advisable to start with something not too difficult. As indicated in Chapter 1, an introduction directed to a general public is contained in: Barab´asi, A.L.

2002.Linked: The New Science of Networks. Perseus Publishing, Cambridge, MA, whose reading requires a modest knowledge of mathematics. To go deeper into the subject consider for example: Bornholdt, S. and H.G. Schuster, Edi-tors. 2002.Handbook of Graphs and Networks. Wiley–VCH, Berlin.

The mathematical properties of graphs related to network modeling are

scholarly analyzed in the book: Dorogovstev, S.N. and J.F.F. Mendes. 2003.

Evolution of Networks. Oxford University Press, where the interested reader can find a complete mathematical justification of many of the properties of networks described in the present book. A simpler treatment of similar sub-jects can be found in: Barrat, A., M. Barth´elemy, and A. Vespignani. 2008.

Dynamic Processes on Complex Networks. Cambridge University Press.

The fascinating story of Paul Erd˝os can be read in the very pleasant book:

Hoffman, P. 1988. The Man Who Loved Numbers: The Story of Paul Erd˝os and the Search for Mathematical Truth. London: Fourth Estate Ltd.

The Oeuvres compl`etes de Vilfredo Pareto (The Complete Works of Vil-fredo Pareto) have been published in Switzerland in thirty volumes, all written in French. The major biographies are also in French. An accurate summary in English of Pareto’s life and work can be found in Wikipedia.

Chapter 7 Giant components, small worlds, fat tails, and the Internet

How networks appear in the real world, and how they can be studied with a reasonable mixture of mathematics and observations.

One of the greatest consolations of this world is friendship, and one of the pleasures of friendship is to have someone to whom we may entrust a secret. Now, friends are not divided into pairs, as husband and wife: everybody generally speaking, has more than one; and this forms a chain of which no one can find the first link. When, then, a friend meets with an opportunity of depositing a secret in the breast of another, he, in his turn, seeks to share in the same pleasure. He is entreated, to be sure, to say nothing to anybody; and such a condition, if taken in the strict sense of the words, would immediately cut short the chain of these gratifications: but general practice has determined that it only forbids the entrusting of the secret to everybody except one equally confidential friend, imposing upon him, of course, the same conditions. Thus, from confidential friend to confidential friend, the secret threads its way along this immense chain, until, at last, it reaches the ear of him or them whom the first speaker exactly intended it should never reach. However, it would, generally, take a long time on the way, if everybody had but two friends, the one who tells him, and the one to whom he repeats it with the injunction of silence. But there are some highly favoured men who reckon these blessings by the hundred, and when the secret comes into the hands of one of these, the circles multiply so rapidly that it is no longer possible to pursue them.¹ So if at the next step one of these “highly favoured men” tells the secret to one hundred friends, who are probably also highly favored, the secret is deposited in the breasts of 100² = 10,000 new custodians, and the process

1Manzoni 1827, Ch. XI. The stylish Italian novelThe betrothed is considered a mas-terpiece of world literature. It may well be that these authors have inserted the citation because Manzoni’s novel is an inescapable part of their personal culture. In any case, this description of secret spreading predicted the chain email almost two centuries in advance.

FIGURE 7.1: A graph of secret sharing with connections between mutual friends.

goes on exponentially along a communication tree where the number of nodes is multiplied by 100 at each level.

Chances are, however, that the 10,000 custodians are not all distinct, ac-cording to the universal law that “a friend of my friend is also my friend.”

The set of custodians then grows in the form of a complex communication network as shown in Figure 7.1. Unlike the tree of exponential growth, the network is highly clustered, with groups of friends all linked to one another, and short loops appear. On the other hand the exponential growth cannot go on for too long, because in a few steps, the total number of custodians would exceed the population of the globe. Either the tree stops evolving, or closes onto itself forming long loops.

In Chapter 6 we have seen how networks can grow according to different mathematical rules, with random or preferential linking, or with a mixture of the two. Clearly different networks require different algorithms and the same problem may be easy on one and difficult on another. Which parameters are then relevant for network study? Let us approach this by trying to answer a natural question: what happens in the real world?

Dans le document Algorithmic Foundations of the Internet (Page 123-132)