• Aucun résultat trouvé

Non-Gaussian distributions and power laws

1.4 What do we call “complex”?

1.4.2 Non-Gaussian distributions and power laws

Herbert (2006) notes that, within a complex system, events do not follow a normal (Gaussian) distribution. To understand what this means, I shall devote a few paragraphs to explaining what a normal distribution is. The

FIGURE(1.6) Density function of the standard normal distri-bution.

normal distribution has the following probability density function:

P(x) = 1 σ

√2πe

(x−µ)2

2 (1.14)

whereµ is the mean of the distribution (as well as its median and mode), σ is the standard deviation andσ2is the variance. Graphically, the normal dis-tribution is bell-shaped, as depicted in Figure1.6. Among other properties, two of the most salient characteristics of the normal distribution are symme-try around the mean and unimodality (i.e. the fact of having only one mode).

Concerning this last property, it should be noted that, strictly speaking, also non-normal distribution having several recurring values but only one that is more recurring are to be considered unimodal. Technically, one should speak of bimodal or multimodal distributions only when there are values that re-cur “equally often”. The normal distribution, however, has only one mode in its largest sense. Indeed, bearing in mind that the mean of a normal dis-tribution is equal to its mode, its first derivative is positive for all values ofx lower than the mean, negative for all values ofxgreater than the mean and null only whenx is equal to the mean. Besides, it has two inflection points (where the second derivative is zero and changes sign), located one standard

FIGURE (1.7) Density function of the standard normal dis-tribution (dashed), with its first derivative (red) and second

derivative (green).

deviation below and above the mean. This is easier understood by looking at Figure1.7. These characteristics make sure that the likelihood of deviations from the mean declines as we move away from the centre of the distribution and that this decline is the same on both sides. In practical terms, this tells us that we can reasonably neglect deviations from the mean, especially extreme ones. However, a Gaussian distribution does not actually deny the existence of extreme cases, it only assumes that they are extremely unlikely. The pecu-liarity of the normal distribution (and the reason of its success) is that it can

“rule out” the importance of extreme cases thanks to the central limit theo-rem. This theorem says that, under certain (fairly) common conditions, the sum of independent random variables tends to be normally distributed, even if the original population is not normal. We can easily understand this idea by throwing a die several times and summing up the results. If we throw one die once, results ranging from 1 to 6 are equally likely to occur (or uniformly distributed), with each result having a probability of 1/6. However, as we in-crease the number of throws (or of dice thrown at the same time) and sum up the resulting numbers, we will notice that central values are more likely than extreme values. If we throw two dice, the probability of getting numbers

summing up to 7 is three times the combined probabilities of getting num-bers summing up to either 2 or 12 (specifically, 1+6, 2+5, 3+4, 4+3, 5+2 and 6+1, versus 1+1 and 6+6). With only one repetition, an originally uniform distribution is now much less uniform. As we increase the number of dice, we eventually approximate a normal distribution, as we can see in Figure 1.8.13

In the case of complex phenomena, extreme events occur more frequently than a Gaussian distribution would predict and, most importantly, they carry more weight than one could expect. Besides, if we concentrate on the average of the observed values, we might be missing an important part of the story.14 In complex phenomena, seemingly unlikely events are not that unlikely and can have dramatic repercussions. In general, one might be tempted to use historical observations to make predictions, assuming the existence of pre-determined patterns which will eventually repeat themselves over time, in a cyclical fashion. This is not the case for complex systems, where outliers often have significant consequences. In relation to complex systems, scholars have sometimes spoken of “black swans”15 to define those occurrences that are believed not to be possible until they actually occur. Besides, it can be ar-gued that these events are the only ones that can seriously affect a system and have a long-term impact (such as sudden shocks in the financial markets).

I shall devote a few words to clarifying the difference between non-linearity and non-Gaussianity, as they can be easily mistaken. Both ideas focus on the fact that apparently minor issues can have important consequences. How-ever, non-linearity is about the magnitude of the impact of small events, re-gardless of their likelihood or frequency. Non-Gaussianity, conversely, only reminds us that, in complex systems, one cannot rule out extreme events, if only because they are often the ones that imply the most significant impacts.

As I discuss later, many complex social and physical phenomena do not fol-low the assumptions of Gaussianity. Probability in complex systems often

13Cmglee CC BY-SA 3.0 (https://creativecommons.org/licenses/by-sa/3.0), via Wikime-dia Commons:https://commons.wikimedia.org/wiki/File:Dice_sum_central_limit_theorem.svg.

14This idea can also be explained by a simple but quite telling joke. I shall leave to the reader any further considerations. The joke goes like this: “A physicist, a mathematician and an econometricians go out hunting, and come across a large deer. The physicist fires, but misses, by 50 centimetres to the left. The mathematician fires, but also misses, by 50 centimetres to the right. The econometrician doesn’t fire and shouts triumphant: ‘We got it!’”

15The expression “black swan” is attributable to the Latin poet Juvenal, who wrote: “Rara avis in terris nigroque simillima cygno” (“A rare bird on Earth, very much like a black swan”, my translation). Being common in Southwestern and Eastern Australia, black swans were presumed not to exist by the Latin scholars who coined the phrase. For more on the idea of black swans, see Taleb (2007).

FIGURE (1.8) Comparison of probability density functions, p(k) for the sum of n fair 6-sided dice to show their conver-gence to a normal distribution with increasingn, in accordance

to the central limit theorem.

followspower law distributionsrather than Gaussian distributions. It has been shown that power laws occur in several instances in both natural and social phenomena. Newman (2005) notes that power laws are impressively ubiq-uitous and mentions several phenomena that display power laws, such as city populations, the magnitude of earthquakes, the intensity of solar flares, number of battle deaths in wars, the frequency of use of words in human languages, the number of times papers are cited, the number of hits on web pages, the numbers of species in biological taxa, people’s annual incomes. A power law is described by the following function:

f(x) axk (1.15)

where the symbol ∝ denotes direct proportionality. This formula simply states that the value of f(x)is proportional toxby a factoraand an exponent k.16 In other words, one value varies as a power of the other. For example, the volume of a cube varies as the third power of (or scales cubically with) the variation of the length of its side (e.g. if we double the length of the side, the volume will increase eight times). The property of power laws that makes them so interesting is the so-calledscale-invariance. Generally speak-ing, something that is scale-invariant does not change when scales of dif-ferent variables are multiplied by the same factor. In other words, it keeps its properties (or, more specifically, it is self-similar) at all levels of zooming.

Graphically, it is easy to notice the scale-invariant property of power laws by simply noting that, given a power law, we can take the logarithm of both sides and obtain the equation of a line on a log-log plot:

y= axk (1.16)

log(y) = log(axk) (1.17) log(y) =log(a)−klog(x) (1.18) Therefore, a power law with exponent −k can also be represented as a line with slope−kon a log-log plot. This means that, at any level of zooming on the function, the relationship between successive elements in the power-law distributions is unchanged. Let us look at a practical example. Figure1.9 re-ports a plot of language families in decreasing order of number of languages belonging to each family, and the same rank-frequency plot on a log-log scale

16The minus sign of the exponent is not compulsory, but convenient given the negative slope of the functions that we are going to study.

(A) Language family sizes.

(B) Language family sizes (log-log plot).

FIGURE(1.9) Language family sizes display a power-law type of relation.

to show linearity17 (Wichmann,2005) (not to be confused with our previous discussion on non-linearity of complex systems).

Interestingly, one of the first who observed the properties of power laws in rank-frequency relations was American linguist and philologist George Zipf.18 He was one of the pioneer of the field of quantitative linguistics and contributed significantly to other disciplines with his findings. In his renowned book “Selected Studies of the Principle of Relative Frequency in Language”, Zipf (1932) explains that the frequencies of words in the corpus of a text in any natural language, when ranked in decreasing order, display

17Although it is not technically accurate to speak of “rank-frequency” plot in this case as the quantities that we are measuring are not frequencies but the number of languages, it is customary to speak of “rank-frequency” plots also for such cases. The plots were produced by simply sorting families in decreasing order of number of languages in them, ranking them starting from the first (i.e. the largest family), and finally plotting the latter as a function of the former. For the log-log plot, the natural logarithm of the two variables was used.

18Before Zipf, Italian economist Vilfredo Pareto had observed power laws in an at-tempt to describe wealth distributions in nineteenth century Italy. Indeed, power-law distributions are also known by the names of “Pareto distributions” or “Zipf distribu-tion”, depending on how variables are displayed on the axes (for more on this, see http://www.hpl.hp.com/research/idl/papers/ranking/ranking.html).

an inverse power relation with their rank.19 He observed that the nth word recurs roughly 1n times as often as the most frequent word (Manaris et al., 2006). This idea is captured by the following formula:

f(n) ∝ 1

na (1.19)

This formula states that the probability of the word of rank n appearing in a text (or the frequency with which it recurs in a natural corpus) is propor-tional to its rank raised to a certain exponent a ≈ 1. In other words, if we order words in decreasing order of frequency in a text, we will observe that the most frequent word (n = 1) has a frequency proportional to 1, the sec-ond most frequent word (n = 2) has frequency proportional to 21a, the third (n =3) has frequency proportional to 31a, and so on. Mandelbrot (1953) later proposed a reviewed version of Zipf’s law:

f(n) 1

(n+β)a (1.20)

whereβis a factor that “shifts” the rank in order to better fit empirically ob-served frequencies. Piantadosi (2014) notes that the simple fact that words vary in frequency is a non-trivial property of language. Besides, it is unclear why this frequency distribution is relatively well approximated by such a simple mathematical relation, especially if we note that this law in no ways accounts for intrinsic aspects of languages, such as the meaning of words and syntax rules. Piantadosi (2014) goes as far as to say that it is "unrea-sonable" that the intricacies characterizing language processes end up gener-ating word frequency distributions that follow such simple statistical laws.

Most of the research work on this topic focused on increasingly more precise derivations of the law; this does not say much about the underlying cogni-tive processes leading to such a law. Numerous explanations for this relation have been proposed, but there is still much debate about whether any of these attempted explanations is on the right track.