Fat tails - Algorithmic Foundations of the Internet

In this world, smaller animals tend to be in the vast majority. There are many more mosquitoes than humans, many more humans than bulls, and only a few hippos. And the property scales: small insects are the majority of all insects, small mammals are the majority of all mammals, etc.

Many artificial or natural phenomena exhibit a similar property, at least in the “tail” of the distribution. On September 11, 2001, a dreadful terrorist attack hit the United States of America taking the life of about 3,000 people.

On January 12, 2010, a tremendous earthquake hit Haiti and the victims were about 30,000. Unfortunately many other terrorist attacks stain the world with blood and many other earthquakes make our lives insecure. But there are only a few large ones, medium size attacks or quakes are more numerous, and the minor ones form the vast majority.

In America there are many small towns, a fair number are of medium size, but there are only a few huge metropolis.⁴Most U.S. airports have only a few flights per day, some are better served, but there are only very few really big

4This phenomenon of urban concentration was already observed by Felix Auerbach in 1913, between Pareto’s and Zipf’s studies.

hubs. Most books appear very rarely in bibliographies, some are referenced more often, while a small minority are mentioned almost everywhere. Exam-ples are countless. Log files for mail servers show that most addresses have a limited number of shared messages, some have more, and a few have a huge number of connections. In the distribution of links in the World Wide Web a vast majority of pages have an extremely limited number of incoming links, some are more pointed to, and only a few are very popular.

Most of these phenomena are ruled by a power law, as found by Pareto for the distribution of income and by Zipf for the frequency of words (see section 6.3 in the previous chapter). From the number y of earthquakes of a given magnitudex, to the numberyof Web pages with xincoming links, the mathematical expression of the relation is approximately:

y=a x^−γ (7.4)

witha andγ positive constants: a power law. There are examples of diverse origin: from biology to social sciences, from communication to transport. And from the Internet and the Web as we shall discuss later.

As we have seen in Chapter 6, relation (7.4) can be represented in a logx, logyplane as a straight line with negative slopeγ(Figure 6.6(b)). And in fact the discovery of power laws in artificial or natural phenomena has generally come out of observations of experimental data plotted in logarithmic scale.

Other properties, however, are peculiar to power laws and are important for their consequences.

Compared to the Poisson distribution typical of random phenomena, a power law has no peak around the average value of y (see Figures 6.3(a) and 6.4(b) of Chapter 6). Furthermore power laws go to zero quite slowly for increasingx, so that the elements with a very large value ofxare a minority but their probability of showing up is non-negligible. They disappear only if xreaches a point of “physiological” cut-off (an earthquake of magnitude 10 in the Richter scale; an airport with as many connections as the total number of existing airports). This actually means that “the rich” at the far right of the x axis are sufficiently numerous to get most of the total “wealth,” as was demonstrated by Pareto in economic terms over one century ago, and is unfortunately still true today as far as the distribution of income is concerned.

This property is referred to as a fat (or heavy) tail of the distribution. In these phenomena a mean value ofx is not particularly significant, since the distribution is completely unbalanced to the right.

Another fundamental property of power laws, tightly connected with those previously discussed, is that theyscale, and in fact they are the only mathe-matical laws with this property. In mathemathe-matical terms this means that mul-tiplying x by a constant c (or scalingx by c) causes a proportional change (scaling) ofy. That is, a functiony=f(x) scales iff(cx)∝f(x). And indeed for a power lawf(x) =a x^−γ we have:

f(cx) =a(cx)^−γ=c^−γax^−γ =c^−γf(x). (7.5)

Note that the exponent ofxis negative for most of the power laws that we consider here, that is, the function isconvexin a mathematical sense. However a functionf(x) =a x^α scales for α positive or negative, and αis called the scaling exponent. A consequence of scaling, or another way of looking at it, is that any portion of the curve behaves as the whole. The earthquakes in the whole Richter scale 1 to 10 have the same distribution as those with magnitude between 6 and 7; so in this range the earthquakes of magnitude 6 are much more frequent than those of magnitude 7. Limiting Pareto’s income distribution to a subset of the rich we find that most of the wealthy people own much less than the super-wealthy, etc.⁵

A concept related to scaling is self-similarity, that is, at any degree of magnification a portion of the curve is similar to the whole. Many phenomena in the real world show some degree of self-similarity, which is perfectly met if their mathematical description is a power law. In particular self-similarity is a property of fractals, mathematical objects that can be represented as geometric shapes that can be divided into parts, each of which is a smaller reproduction of the whole. Most people will have heard of fractals. Without getting into a complex mathematical analysis we recall that, unlike common objects in Euclidean geometry that have an integer dimensionD (D = 1 for lines, D = 2 for surfaces, D = 3 for solids, etc.), fractals have intermediate non integer dimensions.

Many natural phenomena seem to have a fractal structure, at least approx-imately. For example the vascular networks that distribute resources within an organism, like the blood circulation system, have a dimensionD between 2 and 3. Unlike surfaces (D= 2), if these fractals are observed down to a finer and finer scale they appear to fill most (but not all) of the space. According to some leading biologists this explains some controversial observations on living organisms, although their theories need further validation. And here power laws come in again.

It is well known that the larger an animal, the lower the metabolism B (essentially the amount of food consumed) compared with the animal’s mass M. For a long time the amount of heatH exchanged with the outside of the animal’s body was considered a measure of metabolism, that isB∝H. Now heat exchange takes place through the animal’s skin, that is a two-dimensional surface in the linear size L of the animal. We then have: B ∝ L², while M ∝L³; thereforeB∝M^2/3. A power-law, but not the correct one.

In fact, in the early 1930s Max Kleiber made a famous series of measure-ments from which he concluded thatB andM are indeed related by a power law, but the scaling exponent is 3/4. That is:

B ∝M^3/4 (Kleiber’s law). (7.6)

This surprising result has been justified in the 1990s considering that the

5Most people do not consider this a particularly painful truth. Apparently the function

“pain” does not scale.

capillary network for distributing nutrients, that is the organ responsible for metabolism, has in fact a fractal structure of non-integer dimension D >2, finally yielding the 3/4 exponent ofM for measuring metabolism and other biological parameters.

Power laws, then, are encountered everywhere when modeling networks in mathematical terms. In particular they show up in large systems guided by self-regulation rules based on local knowledge, without the intervention of a central control: a nice family of anarchic networks that include the Internet and the Web. No wonder, on the basis of what we have seen in Chapter 6, that a combination of randomness and preferential linking is a crucial ingredient for their evolution.

Dans le document Algorithmic Foundations of the Internet (Page 138-141)