Frequency patterns of semantic change: Corpus-based evidence of a near-critical dynamics in language change

(1)

HAL Id: halshs-01483599

https://halshs.archives-ouvertes.fr/halshs-01483599

Preprint submitted on 6 Mar 2017

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Copyright

evidence of a near-critical dynamics in language change

Q Feltgen, Benjamin Fagard, Jean-Pierre Nadal

To cite this version:

Q Feltgen, Benjamin Fagard, Jean-Pierre Nadal. Frequency patterns of semantic change: Corpus-

based evidence of a near-critical dynamics in language change. 2017. �halshs-01483599�

(2)

Corpus-based evidence of a near-critical dynamics in language change

Q. Feltgen

¹

, B. Fagard

²

and J.P. Nadal

^1,3

1

Laboratoire de Physique Statistique, ´ Ecole Normale Sup´ erieure, PSL Research University; Universit´ e Paris Diderot,

Sorbonne Paris-Cit´ e; Sorbonne Universit´ es, UPMC – Univ. Paris 06; CNRS; Paris, France.

2

Laboratoire Langues, Textes, Traitements informatique, Cognition (LaTTiCe, UMR 8094 CNRS - ENS - Universit´ e Paris 3), ´ Ecole normale sup´ erieure, Paris, France.

3

Ecole des Hautes ´ ´ Etudes en Sciences Sociales, PSL Research University, CNRS, Centre d’Analyse et de Math´ ematique Sociales, Paris, France.

It is generally believed that, when a linguistic item acquires a new meaning, its overall frequency of use in the language rises with time with an S-shaped growth curve. Yet, this claim has only been supported by a limited number of case studies. In this paper, we provide the first corpus- based quantitative confirmation of the genericity of the S-curve in language change. Moreover, we uncover another generic pattern, a latency phase of variable duration preceding the S-growth, during which the frequency of use of the semantically expanding word remains low and more or less constant. We also propose a usage-based model of language change supported by cognitive considerations, which predicts that both phases, the latency and the fast S-growth, take place. The driving mechanism is a stochastic dynamics, a random walk in the space of frequency of use. The underlying deterministic dynamics highlights the role of a control parameter, the strength of the cognitive impetus governing the onset of change, which tunes the system at the vicinity of a saddle- node bifurcation. In the neighborhood of the critical point, the latency phase corresponds to the diffusion time over the critical region, and the S-growth to the fast convergence that follows. The duration of the two phases is computed as specific first passage times of the random walk process, leading to distributions that fit well the ones extracted from our dataset. We argue that our results are not specific to the studied corpus, but apply to semantic change in general.

Language can be approached through three different, complementary perspectives. Ultimately, it exists in the mind of language users, so that it is a cognitive entity, rooted in a neuro-psychological basis. But language ex- ists only because people interact with each other: It emerges as a convention among a community of speakers, and answers to their communicative needs. Thirdly, lan- guage can be seen as something in itself: An autonomous, emergent entity, obeying its own inner logic. If it was not for this third Dasein of language, it would be less obvious to speak of language change as such.

The social and cognitive nature of language informs and constrains this inner consistency. Zipf’s law, for instance, may be seen as resulting from a trade-off be- tween the ease of producing the utterance, and the ease of processing it [1]. It relies thus both on the cognitive grounding of the language, and on its communicative nature. Those two external facets of language, cogni- tive and sociological, are similarly expected to channel the regularities of linguistic change. Modeling attempts (see [2] for an overview) have explored both how socio- linguistic factors can shape the process of this change [3, 4] and how this change arises through language learn- ing by new generations of users [5, 6]. Some models also consider mutations of language itself, without providing further details on the social or cognitive mechanisms of change [7]. In this paper, we propose to view language change as initiated by language use, which is the repeated call to one’s linguistic resources in order to express one- self or to make sense of linguistic productions of others.

This approach is in line with exemplar models [8] and re- lated works, such as the Utterance Selection Model [9] or the model proposed by Victorri [10], which describes an out-of-equilibrium shaping of semantic structure through repeated events of communication.

Leaving aside socio-linguistic factors, we focus on a cognitive approach of linguistic change, more precisely of semantic expansion. Semantic expansion occurs when a new meaning is gained by a word or a construction (we will henceforth refer more vaguely to a linguistic ‘form’, so as to remain as general as possible). For instance, way, in the construction way too, has come to serve as an in- tensifier (e.g. ‘The only other newspaper in the history of Neopia is the Ugga Ugg Times, which, of course, is way too prehistoric to read.’ [11]). The fact that polysemy is pervasive in any language [12] suggests that semantic expansion is a common process of language change and happens constantly throughout the history of a language.

Grammaticalization [13] – a process by which forms ac- quire a (more) grammatical status, like the example of way too above – and other interesting phenomena of lan- guage change [14, 15], fall within the scope of semantic expansion.

Semantic change is known to be associated with an in- crease of frequency of the form whose meaning expands.

This increase is expected indeed: As the form comes to

carry more meanings, it is used in a broader number of

contexts, hence more often. This implies that any in-

stance of semantic change should have its empirical coun-

terpart in the frequency rise of the use of the form. This

(3)

rise is furthermore believed to follow an S-curve [16, 17], yet such claim, to our knowledge, has not been quanti- tatively grounded on more than a few chosen examples.

Besides, it is not easily accounted through theoretical modeling: In a sociolinguistic framework for instance, it requires either a very specific social structure, or the as- sumption that the new use is favored intrinsically [18].

Such a framework also suffers from what is known as the Threshold Problem, the fact that a novelty will fail to take over an entire community of speakers, because of the isolated status of an exceptional deviation [19].

In this paper, we provide a broad corpus-based inves- tigation of the frequency patterns associated with a few hundred semantic expansions. It turns out that the S- curve pattern is corroborated, but must be completed by a preceding latency part, in which the frequency of the form does not significantly increase, even if the new meaning is already present in the language. To explain this surprising behavior, which seems to have escaped notice so far, we propose a usage-based model of the process of semantic expansion, implementing basic cog- nitive hypotheses regarding language use. By means of our model, we relate the micro-process of language use at the individual scale, to the observed macro-phenomenon of a recurring frequency pattern occurring in semantic expansion.

I. QUANTIFICATION OF CHANGES IN A LARGE CORPUS

We worked on the French corpus Frantext [20], to our knowledge the only textual database allowing for a reli- able study covering several centuries (see Material and Methods and Appendix A). We studied changes in fre- quency of use for 400 forms which have undergone one or several semantic expansions, on a time range going from 1321 up to nowadays. We choose forms so as to focus on semantic expansions leading to a functional meaning

— such as discursive, prepositional, or procedural mean- ings. Semantic expansions whose outcome remains in the lexical realm (as the one undergone by sentence, whose meaning evolved from ‘verdict, judgment’ to ‘meaningful string of words’) have been left out. Functional mean- ings indeed present several advantages: They are often accompanied by a change of syntagmatic context, allow- ing to track the semantic expansion more accurately (e.g.

way in way too + adj.); they are also less sensitive to socio-cultural and historical influences; finally they are less dependent on the specific content of a text, be it literary or academic.

The profiles of frequency of use extracted from the database are illustrated on Figure 1 for nine forms. We find that 286 cases display at least one sigmoidal increase of frequency in the course of their evolution, which makes up more than 70% of the total. We provide a small selec- tion of the observed frequency patterns (Fig. 2a), whose associated logit transforms (Fig. 2b) follows a linear be-

havior, indicative of the sigmoidal nature of the growth (see Material and Methods). We thus find a robust sta- tistical validation of the sigmoidal pattern, confirming the general claim made in the literature.

Furthermore, we find two major phenomena besides this sigmoidal pattern. The first one is that, in most cases, the final plateau towards which the frequency is expected to stabilize after its sigmoidal rise is not to be found: The frequency immediately starts to decrease af- ter having reached a maximum (Fig. 1). However, such a decrease process is not symmetrical with the increase, in contrast with other cases of fashion-driven evolution in language, e.g. first names distribution [21]. Though this decrease may be, in a few handful of cases, imputable to the disappearance of a form (ex: apr` es ce, replaced in Modern French by apr` es quoi), in most cases it is more likely to be the sign of a narrowing of its uses.

The second feature is that the fast growth is very often preceded by a long latency up to several centuries, during which the new form is used, but with a comparatively low and rather stable frequency (Fig. 2a). One should note that the latency times may be underestimated: If the av- erage frequency is very low during the latency part, the word may not show up at all in the corpus, especially in decades for which the available texts are sparse. The pat- tern of frequency increase is thus better conceived of as a latency followed by a growth, as exemplified by de toute fa¸ con (Fig. 3) — best translated by anyway in English, since the present meanings of these two terms are very close, and remarkably, despite quite different origins, the two have followed parallel paths of change.

To our knowledge, these two features, latency and ab- sence of a stable plateau, have not been documented be- fore, even though a number of specific cases of latency have been observed. For instance, it has been remarked in the case of just because that the fast increase is only one stage in the evolution [22]). In the following, we propose a model describing both the latency and the S- growth periods. We leave for future work the study of the decrease of frequency following the S-growth.

II. A COGNITIVE SCENARIO

To account for the specific frequency pattern evidenced

by our data analysis, we propose a scenario focusing

on cognitive aspects of language use, leaving all socio-

linguistic effects back-grounded by making use of a repre-

sentative agent, mean-field type, approach. We limit our-

selves to the case of a competition between two linguistic

variants, given that most cases of semantic expansion can

be understood as such, even if the two competing variants

cannot always be explicitly identified. Initially, in some

concept or context of use C

₁

, one of the two variants,

henceforth noted Y , is systematically chosen, so that it

conventionally expresses this concept. The question we

address is thus how a new variant, say X , can be used in

this context and eventually evict the old variant Y ?

(4)

FIG. 1. Frequency evolution on the whole time range (1321-2020) of nine different forms. Each blue bar shows the frequency associated to a decade. Frequency has been multiplied by a 10

⁵

factor for an easier reading.

A. Hypotheses

The main hypothesis we propose is that the new vari- ant almost never is a brand new merging of phonemes whose meaning would pop out of nowhere. As Haspel- math highlights [23], a new variant is almost always a periphrastic construction, i.e., actual parts of language, put together in a new, meaningful way. Furthermore, such a construction, though it may be exapted to a new use, may have showed up from time to time in the time course of the language history, in an entirely composi- tional way; this is the case for par ailleurs, which inci- dentally appears as early as the xiv

^th

in our corpus, but arises as a construction in its own right during the first part of the xix

^th

century only. In other words, the use of a linguistic form X in a context C

1

may be entirely new, but the form X was most probably already there in another context of use C

0

, or equivalently, with another meaning.

We make use of the well-grounded idea [24] that there exists links between concepts due to the intrinsic pol- ysemy of language: There are no isolated meanings, as each concept is interwoven with many others, in a compli- cated tapestry. These links between concepts are asym- metrical, and they can express both universal mappings

between concepts [25, 26] and cultural ones (e.g. en- trenched metaphors [27]). As the conceptual texture of language is a complex network of living relations rather than a collection of isolated and self-sufficient monads, semantic change is expected to happen as the natural course of language evolution and to occur repetitively throughout its history, so that at any point of time, there are always several parts of language which are undergoing changes. The simplest layout accounting for this network structure in a competitive situation consists then in two sites, such that one is influencing the other through a cognitive connexion of some sort.

B. Model formalism

We now provide details on the modeling of a compe- tition between two variants X and Y for a given context of use, or concept, C

₁

, also considering the effect exerted by the related context or concept C

0

on this evolution.

• Each concept C

_i

, i = 0, 1, is represented by a set

of exemplars of the different linguistic forms. We note

N

_µⁱ

(t) the number at time t of encoded exemplars (or

occurrences) of form µ ∈ {X, Y }, in context C

i

, in the

memory, of the representative agent.

(5)

(a)

(b)

FIG. 2. (a) A selection of frequency evolutions showing the latency period and the S-growth, separated by a red vertical line. (b) Logit transforms of the S-growth part of the preced- ing curves. Red dots correspond to data points and the green line to the linear fit of this set of points.

• The memory capacity of an individual being finite, the population of exemplars attached to each concept C

_i

has a finite size M

_i

. For simplicity we assume that all memory sizes are equal (M

0

= M

1

= M ). As we consider only two forms X and Y , for each i the relation N

_Xⁱ

(t) +N

_Yⁱ

(t) = M always hold: We can focus on one of the two forms, here X , and drop out the form subscript, granted that all quantities refer to X .

• The absolute frequency x

ⁱ_t

of form X at time t in context C

_i

— the fraction of ‘balls’ of type X in the bag attached to C

_i

— is thus given by the ratio N

ⁱ

(t)/M . In the initial situation, X and Y are assumed to be estab- lished convention for respectively expressing C

0

and C

1

, so that we start with N

⁰

(t = 0) = M and N

¹

(t = 0) = 0.

FIG. 3. Overall evolution of the frequency of use of de toute fa¸ con (main panel), with focus on the S-shape increase (right inner panel), whose logit transformation follows a linear fit (left inner panel). Preceding the S-growth, one observes a long period of very low frequency (up to 34 decades).

• Finally, C

₀

exerts an influence on context C

₁

, but this influence is assumed to be unilateral. Consequently, the content of C

0

will not change in the course of the evolution and we can focus on C

1

. An absence of explicit indication of context is thus to be understood as referring to C

₁

.

C. Dynamics

The dynamics of the system runs as follow. At each time t, one of the two linguistic forms is chosen to express concept C

1

. The form X is uttered with some probabil- ity P (t), to be specified below, and Y with probability 1 − P (t). In order to keep constant the memory size of the population of occurrences in C

1

, a past occurrence is randomly chosen (with a uniform distribution) and the new occurrence takes its place. This dynamics is then repeated a large number of times. Note that this model focuses on a speaker perspective (for alternative variants, see Appendix B).

We want to explicit the way P (t) depends on x(t), the absolute frequency of X in this context at time t. The simplest choice would be P (t) = x(t). However, we want to take into account several facts, as explained below.

• As context C

0

exerts an influence on context C

1

, de- noting by γ the strength of this influence, we assume the probability P to rather depend on an effective frequency f(t) (Fig. 4a),

f(t) = N

¹

(t) + γN

⁰

(t)

M + γM = x(t) + γ

1 + γ . (1)

• We now specify the probability P (f ) to select X at

time t as a function of f = f (t). First, P (f ) must be

nonlinear. Otherwise, the change occurs with certainty

as soon as the effective frequency f of the novelty is non-

zero, that is, insofar two meanings are related, the form

expressing the former will also be recruited to express the

(6)

latter. This change would also start in too abrupt a way, while sudden, instantaneous takeovers are not known to happen in language change.

Second, one should preserve the symmetry between the two forms, that is, P(f ) = 1 − P (1 − f ), as well as verify P (0) = 0 and P (1) = 1. Note that this symmetry is stated in terms of the effective frequency f instead of the actual frequency x, as production in one context always accounts for the contents of neighboring ones.

For the numerical simulations, we made the following specific choice which satisfies these constraints:

P (f) = 1 2

(

1 + tanh β f − (1 − f) p f(1 − f)

!)

, (2)

where β is a parameter governing the non-linearity of the curve. Replacing f in terms of x, the probability to choose X is thus a function P

_γ

(x) of the current absolute frequency x:

P

γ

(x) = 1 2

(

1 + tanh β 2x − 1 + γ p (x + γ)(1 − x)

!) (3)

D. Analysis: Bifurcation and latency time The dynamics outlined above (Fig. 4b) is equivalent to a random walk on the segment [0; 1] with a reflecting boundary at 0 and an absorbing one at 1, and with steps of size 1/M. The probability of going forward at site x is equal to (1 − x)P

γ

(x), and the probability of going backward to x(1 − P

γ

(x)).

For large M , a continuous, deterministic approxima- tion of this random walk leads, after a rescaling of the time M t → t, to the first order differential equation for x(t):

˙

x = P

γ

(x) − x . (4)

This dynamics admits either one or three fixed points (Fig. 5a), x = 1 always being one. Below a threshold value γ

_c

, which depends on the non-linearity parameter β, a saddle-node bifurcation occurs and two other fixed points appear. The system, starting from x = 0, is stuck at the smallest stable fixed point. The transmission time, i.e. the time required for the system to go from 0 to 1, is therefore infinite (Fig. 5b). Above the threshold value γ

_c

, only the fixed point x = 1 remains, so that the new variant eventually takes over the context for which it is competing. Our model thus describes how the strength- ening of a cognitive link can trigger a semantic expansion process.

Slightly above the transition, a stranglehold region ap- pears where the speed almost vanishes. Accordingly, the time spent in this region diverges. The frequency of the new variant will stick to low values for a long time, in a way similar to the latent behavior evidenced by our dataset. This latency time in the process of change can

(a) (b)

FIG. 4. (a) Difference between absolute frequency x and rela- tive frequency f in context C

1

. Absolute frequency x is given by the ratio of X occurrences encoded in C

1

. Effective fre- quency f also takes into account the M occurrences contained in the influential context C

0

, with a weight γ standing for the strength of this influence. (b) Schematic view of the process.

At each iteration, either X or Y is chosen to be produced and thus encoded in memory, with respective probability P

γ

(x) and 1 − P

γ

(x); the produced occurrence is here represented in the purple capsule. Another occurrence, already encoded in the memory, is uniformly chosen to be erased (red circle) so as to keep the population size constant. Hence the number of X occurrences, N

X

, either increases by 1 if X is produced and Y erased, decreases by 1 if Y is produced and X erased, or remains constant if the erased occurrence is the same as the one produced.

thus be understood as a near-critical slowing down of the underlying dynamics.

Past this deterministic approximation, there is no more clear-cut transition (Fig. 5b) and the above explanation needs to be refined. The deterministic speed can be un- derstood as a drift velocity of the Brownian motion on the [0; 1] segment, so that in the region where the speed vanishes, the system does not move in average. In this region of vanishing drift, the frequency fluctuates over a small set of values and does not evolve significantly over time. Once it escapes this region, the drift velocity drives the process again, and the replacement process takes off.

Latency time can thus be understood as a first-passage time out of a trapping region.

III. NUMERICAL RESULTS A. Model simulations

We ran numerical simulations of the process described above (Fig. 4b), with the following choice of parameters:

β = 0.808, δ = 0.0 and M = 5000, where δ = (γ − γ

c

)/γ

c

is the distance to the threshold. The specific value of

β corresponds to a maximization of x

c

, the frequency

value at which the system gets stuck. It reflects the as-

sumption that the linguistic system should allow for syn-

onymic variation in the situation where no replacement

takes place. We chose δ = 0.0 in order for the system to

be purely diffusive in the vicinity of x

c

. The choice of M

(7)

(a) (b)

FIG. 5. (a) Speed ˙ x of the deterministic process for each of the sites, for different values of β and δ = (γ −γ

c

)/γ

c

, the distance to threshold. Depending on the sign of δ, there is either one or three fixed points. (b) Inverse transmission time (time required for the system to go from 0 to 1), for the deterministic process (blue dotted line), and for the averaged stochastic process (green line), as a function of the control parameter δ. Deterministic transmission time diverges at the transition while averaged stochastic transmission time remains finite.

is arbitrary.

From the model simulations, data is extracted and an- alyzed in two parallel ways. On one side, simulations provide surrogate data: We can mimic the corpus data analysis and count how many tokens of the new variant are produced in a given timespan (set equal to M ), to be compared with the total number of tokens produced in this timespan. We then extract ’empirical’ latency and growing times (Fig. 6a), applying the same procedure as for the corpus data.

One the other side, for each run we track down the po- sition of the walker, which is the frequency x(t) achieved by the new variant at time t. This allows to compute first passage times. We then alternatively compute analyti- cal latency and growth times (‘analytical’ to distinguish them from the former ‘empirical’ times) as follows. La- tency time is here defined as the difference between the first-passage times at the exit and the entrance of a ‘trap’

region (see Appendix C for additional details). Analyt- ical growth time is defined as the remaining time of the process once this exit has been reached. Their distribu- tion over 10, 000 runs of the process are fitted with In- verse Gaussian distribution, which would be the expected distributions if the jump probabilities were homogeneous over the corresponding regions (an approximation then better suited for latency time than for growth time).

Figure 6d shows the remarkable agreement between the

‘empirical’ and ‘analytical’ approaches, together with the quality of the fits with the Inverse Gaussian distribution.

Crucially, those two macroscopic phenomena, latency and growth, are thus to be understood as of the same nature, which explains why their statistical distribution must be of the same kind. Furthermore, the boundaries of the trap region leading to the best correspondence be- tween first passage times and empirically determined la- tency and growth times are meaningful, as they corre- spond to the region where the uncertainty on the trans- mission time significantly decreases (Fig. 6b).

B. Confrontation with corpus data

Our model predicts that both latency and growth times should be governed by the same kind of statis- tics, Inverse Gaussian being a suited approximation of those. Inverse Gaussian distribution is governed by two parameters, its mean µ and a parameter λ given by the ratio µ

³

/σ

²

, σ

²

being the variance. We fit the empirical histograms with an Inverse Gaussian distribution whose parameters are given by the empirical mean and variance of the relevant quantities. We find a good agreement for both the latency and the growth times (Fig. 7).

Although there are short growth times in the frequency patterns of the forms we studied, below six decades they are not described by enough data points to assess reli- ably the specificity of the sigmoid fit. On the histogram there is therefore no data for these growth times. This issue is further discussed in Appendix D. However, the distribution must decrease when growth time approaches 0 (notably an exponential fit is to be ruled out); other- wise, instantaneous changes would be far too numerous, so that language would be completely unstable. The de- crease predicted by the Inverse Gaussian is realistic in this aspect.

The main quantitative features extracted from the dataset are thus correctly mirrored by the behavior of our model. We confronted the model with the data on other quantities, such as the correlation between growth time and latency time. There again, the model proves to match appropriately quantitative aspects of semantic expansion processes Appendix E.

IV. DISCUSSION

Based on a corpus-based analysis of frequency of use, we have uncovered two robust stylized facts of seman- tic change: an S-curve of frequency growth, preceded by a latency period where the semantic change has already taken place while the frequency remains low. We have proposed a model predicting that these two features, al- beit qualitatively quite different, are two aspects of one and the same phenomenon.

The hypotheses on which this model lies are well- grounded on claims from Cognitive Linguistics: Lan- guage is resilient to change (non-linearity of the P func- tion); language users have cognitive limitations; the se- mantic territory is organized as a network whose neigh- boring sites are asymmetrically influencing each other.

The overall agreement with empirical data tends to sug- gest that language change may indeed be cognitively driven by semantic bridges of different kinds between the concepts of the mind, and constrained by the mnemonic limitations of this very same mind. We note that our model may however be given a different, purely socio- linguistic, interpretation: this, together with the limits of such a view point, is discussed in Appendix B 5.

According to our model, the onset of change depends

(8)

(a)

(c) (b)

FIG. 6. (a) Time evolution of the frequency of produced occurrences (output of a single run). Growth part and latency part are shown respectively in blue and red. The logit transform (with linear fit) of the growth is shown in the inset. (b) Distribution of latency time (top) and growth time (bottom) over 10k processes, extracted from an empirical approach (blue wide histogram) and a first-passage time one (magenta thin histogram), with their respective Inverse Gaussian fits (in red: Empirical approach;

in green: First-passage time approach). (c) Uncertainty on the transmission time given the position of the walker. The entrance and the exit of the trap are shown, respectively, by green and magenta line. The trap corresponds to the region where the uncertainty drops from a high value to a low value.

on the strength of the conceptual link between the source context and the target context: If the link is strong enough, that is, above a given threshold, it serves as a channel so that a form can ‘invade’ the target con- text and then oust the previously established form. In a sense, the sole existence of this cognitive mapping is already a semantic expansion of some sort, yet not nec- essarily translated into linguistic use. Latency is specifi- cally understood as resulting from a near-critical behav- ior: If the link is barely strong enough for the change to take off, then the channel becomes extremely tight and the invasion process slows down drastically. These nar- row channels are likely to be found between lexical and grammatical meanings [28, 29]. This would explain why

the latency-growth pattern is much more prominent in the processes of grammaticalization, positing latency as a phenomenological hint of this latter category.

Finally, we argue that our results, though grounded

on instances of semantic expansion in French, apply to

semantic expansion in general. The time period covered

is long enough (700 years) to exclude the possibility that

our results be ascribable to a specific historical, sociolog-

ical, or cultural context. The French language itself has

evolved, so that Middle French and contemporary French

could be considered as two different languages, yet our

analysis apply to both indistinctly. Besides, the latency-

growth pattern is to be found in other languages; for

instance, Google Ngram queries for constructions such

(9)

FIG. 7. Inversian Gaussian fit of the latency times (left) and the growth times (right) extracted from corpus data. Parameters are computed from the mean and the variance of the data. Data points are shown by a blue histogram, the Inverse Gaussian fit being represented as red dots. The discrepancy observed for six decades is discussed in Appendix D.

as way too, save for, no matter what, yield qualitative frequency profiles consistent with our claims. Our model also tends to confirm the genericity of this pattern, as it relies on cognitive mechanisms whose universality has been well evidenced [30].

V. MATERIALS AND METHODS

We worked on the Frantext corpus [20], which in 2016 contained for the chosen time range 4674 texts and 232 millions of words. More details are given in Appendix A.

It would have been tempting to make use of the large database Google Ngram, yet it was not deemed appro- priate for out study, as we explain in Appendix F.

We studied changes in frequency of use for nearly 400 instances of semantic expansion processes in French, on a time range going from 1321 up to nowadays. See Ap- pendix G for a complete list of the studied forms.

A. Extracting patterns from corpus data a. Measuring frequencies. We divided our corpus into 70 decades. Then, for each form, we recorded the number of occurrences per decade, dividing this number by the total number of occurrences in the database for that decade. The output number is called here the fre- quency of the occurrence for the decade, and is noted x

i

for decade i. In order to smooth the obtained data, we replaced x

i

by a moving average, taht is, for i ≥ i

0

+ 4, i

0

being the first decade of our corpus: x

i

←

¹₅

P

i

k=i−4

x

k

. b. Sigmoids. We looked for major increases of fre- quency. When such a major shift is encountered, we automatically (see below) identify frequencies x

_min

and x

max

, respectively at the beginning and the end of the

increasing period. If we respectively note i

start

and i

end

the decades for which x

_min

and x

_max

are reached, then we define the duration w of the increasing period as w = i

end

− i

start

+ 1. To quantify the sigmoidal nature of this growth pattern, we apply the logit transformation to the frequency between x

min

and x

max

:

y

i

= log

x

i

− x

min

x

max

− x

i

. (5)

If the process follows a sigmoid ˜ x

i

of equation:

˜

x

i

= x

min

+ x

max

1 + e

^−hi−b

, (6) then the logit transform of this sigmoid satisfies: ˜ y

i

= h i + b . We thus fit the y

i

’s given by (5) with a linear function, which gives the slope h associated with it, the residual r

²

quantifying the quality of the fit. The bound- aries i

_start

and i

_end

have been chosen so as to maximize w, with the constraint that the r

²

of the linear fit should be at least equal to a value depending on the number of the data points.

c. Latency period. In most cases (74% of sigmoidal growths), one observes that the fast increasing part is preceded by a phase during which the frequency remains constant or nearly constant. The duration of this part, denoted by T

₁

in this paper, is identified automatically as follows. Starting from the decade i

start

, previous decades j are included in the latency period as long as they verify

|x

j

− x

min

| < 0.15 ∗ (x

max

− x

min

) and x

j

> 0, and cease

to be included either as soon as the first condition is not

verified, or if the second condition does not hold for a

period longer than 5 decades. Then the start i

lat

of the

latency point is defined as the lowest j verifying both

conditions, so that T

1

is given by T

1

= i

start

− i

lat

.

(10)

ACKNOWLEDGMENTS

We thank B. Derrida for a useful discussion on random walks. QF acknowledges a fellowship from PSL Research

University. BF is a CNRS member. JPN is senior re- searcher at CNRS and director of studies at the EHESS.

[1] Ramon Ferrer i Cancho and Ricard V Sol´ e. Least effort and the origins of scaling in human language. Proceed- ings of the National Academy of Sciences, 100(3):788–

791, 2003.

[2] Quentin Feltgen, Benjamin Fagard, and Jean-Pierre Nadal. Modeling Language Change: The Pitfall of Gram- maticalization, pages 49–72. Springer, 2017.

[3] Vittorio Loreto, Andrea Baronchelli, Animesh Mukher- jee, Andrea Puglisi, and Francesca Tria. Statistical physics of language dynamics. Journal of Statistical Me- chanics: Theory and Experiment, 2011(04):P04006, 2011.

[4] Jinyun Ke, Tao Gong, and William SY Wang. Language change and social networks. Communications in Compu- tational Physics, 3(4):935–949, 2008.

[5] Martin A Nowak, Natalia L Komarova, and Partha Niyogi. Computational and evolutionary aspects of lan- guage. Nature, 417(6889):611–617, 2002.

[6] Thomas L Griffiths and Michael L Kalish. Language evo- lution by iterated learning with Bayesian agents. Cogni- tive Science, 31(3):441–480, 2007.

[7] Igor Yanovich. Genetic Drift Explains Sapir’s “drift” In Semantic Change. In S.G. Roberts, C. Cuskley, L. Mc- Crohon, O. Barcel´ o-Coblijn, and T. Verhoef, editors, The Evolution of Language: Proceedings of the 11th Interna- tional Conference (EVOLANG11), pages 321–329, 2016.

[8] Janet B Pierrehumbert. Exemplar dynamiccs: Word fre- quency, lenition and contrast, volume 45, page 137. John Benjamins Publishing, 2001.

[9] Gareth J Baxter, Richard A Blythe, William Croft, and Alan J McKane. Utterance selection model of language change. Physical Review E, 73(4):046118, 2006.

[10] Bernard Victorri. The use of continuity in modeling se- mantic phenomena. In Fuchs, C. and Victorri, B., editor, Continuity in linguistic semantics, pages 241–251. Ben- jamins, 1994.

[11] Neopets Inc. The Neopian Times. http://www.neopets.

com/ntimes/index.phtml?section=497377&issue=457, 2010.

[12] Sabine Ploux, Armelle Boussidan, and Hyungsuk Ji. The semantic atlas: an interactive model of lexical represen- tation. In Proceedings of the seventh conference of Inter- national Language Resources ans Evaluation, pages 1–5, 2010.

[13] Paul J Hopper and Elizabeth Closs Traugott. Grammat- icalization. Cambridge University Press, 2003.

[14] Britt Erman and U-B Kotsinas. Pragmaticalization:

the case of ba’and you know. Studier i modern spr˚ akvetenskap, 10:76–93, 1993.

[15] Laurel J Brinton and Elizabeth Closs Traugott. Lexi- calization and language change. Cambridge Univ. Press, 2005.

[16] Anthony Kroch. Reflexes of grammar in patterns of lan- guage change. Language variation and change, 1(3):199–

244, 1989.

[17] Jean Aitchison. Language change: progress or decay?

Cambridge University Press, 2013.

[18] Richard A Blythe and William Croft. S-curves and the mechanisms of propagation in language change. Lan- guage, 88(2):269–304, 2012.

[19] Daniel Nettle. Using social impact theory to simulate language change. Lingua, 108(2):95–117, 1999.

[20] ATILF. FRANTEXT textual database, http://www.frantext.fr., Octobre 2014.

[21] Baptiste Coulmont, Virginie Supervie, and Romulus Bre- ban. The diffusion dynamics of choice: From durable goods markets to fashion first names. Complexity, 2015.

[22] Martin Hilpert and Stefan Th Gries. Assessing frequency changes in multistage diachronic corpora: Applications for historical corpus linguistics and the study of language acquisition. Literary and Linguistic Computing, 24(4):

385–401, 2009.

[23] Martin Haspelmath. Why is grammaticalization irre- versible? Linguistics, 37(6):1043–1068, 1999.

[24] Richard Hudson. Language networks: the new Word Grammar. Oxford University Press, 2007.

[25] Bernd Heine. Cognitive foundations of grammar. Oxford University Press, 1997.

[26] Johannes Dellert. Using Causal Inference To De- tect Directional Tendencies In Semantic Evolution. In S.G. Roberts, C. Cuskley, L. McCrohon, O. Barcel´ o- Coblijn, and T. Verhoef, editors, The Evolution of Language: Proc. of the 11th International Conference (EVOLANG11), pages 88–96, 2016.

[27] George Lakoff and Mark Johnson. Metaphors we live by.

University of Chicago press, 1980.

[28] Bernd Heine. On the role of context in grammaticaliza- tion, volume 49, pages 83–102. 2002.

[29] Gabriele Diewald. Context types in grammaticalization as constructions. Constructions, 1(9), 2006.

[30] Bernd Heine and Tania Kuteva. World lexicon of gram- maticalization. Cambridge University Press, 2002.

[31] Christiane Marchello-Nizia. L’oral repr´ esent´ e: un acc` es construit ` a une face cach´ ee des langues ‘mortes’, pages 247–264. Peter Lang, 2012.

[32] Haim Dubossarsky, Daphna Weinshall, and Eitan Gross- man. Verbs change more than nouns: a bottom-up com- putational approach to semantic change. Lingue e lin- guaggio, 15(1):7–28, 2016.

[33] Sara Gra¸ ca Da Silva and Jamshid J Tehrani. Compar- ative phylogenetic analyses uncover the ancient roots of Indo-European folktales. Royal Society open science, 3 (1):150645, 2016.

[34] Sidney Redner. A guide to first-passage processes. Cam- bridge University Press, 2001.

[35] Bruno Gaume, Karine Duvignau, and Martine Vanhove.

Semantic associations and confluences in paradigmatic

networks. From Polysemy to Semantic Change Towards

a typology of lexical semantic associations, John Ben-

jamins, pages 233–264, 2008.

(11)

[36] Quentin Michard and J-P Bouchaud. Theory of collec- tive opinion shifts: from smooth trends to abrupt swings.

The European Physical Journal B-Condensed Matter and Complex Systems, 47(1):151–159, 2005.

[37] Eitan Adam Pechenick, Christopher M Danforth, and

Peter Sheridan Dodds. Characterizing the Google Books

corpus: Strong limits to inferences of socio-cultural and

linguistic evolution. PloS one, 10(10):e0137041, 2015.

(12)

Appendix A: Textual data base Frantext The data we collected for the present study comes from the Frantext database [20], one of the most ex- tensive databases available in French, to which one has access under subscription by the ATILF-CNRS labora- tory. Frantext is an ever-expanding gathering of 4,746 texts to this day (8th december 2016), updated every year. This corpus presents various literary genres (episto- lary, drama, poetry, essays, scientific books), but mainly novels, almost exclusively from French literature (with a few translated works). The publication year of the texts range from 950 to 2013. The allotment of the texts be- tween the different time periods is however far from being homogeneous, and most of them belong to the twentieth century: Indeed, the number of texts by decade roughly follows an exponential increase (Fig. 8).

Frantext, while being much smaller than Google Ngram, provides much cleaner and more controlled re- sults (see E). We decided to start from the decade 1321- 1330, as from this date all decades are associated with at least seven texts. In our corpus, we retained most of the texts, with a few exceptions, e.g. when the date provided by Frantext was unsatisfying (for instance, the text referred to as 6205, Le Canarien, pi` eces justifica- tives is dated ‘between 1327 and 1470’), or when we knew that the text has been written over too long a time pe- riod, as is the case for the text Chartes et documents de l’abbaye de Saint-Magloire (ref 8203), whose publication year (1330) is far from covering the time span during which the document was compiled. Most interestingly, Frantext also provides the surrounding text on which a token is to be found, so that it is possible to check if the different occurrences make sense and truly correspond to the request.

Frantext is not flawless. Some parts of the scanned texts have been appended through posterior editing.

This is clearly the case for the text A017, Chroniques de Mor´ ee, where some page notes from a contemporane- ous edition of this medieval chronicle have been included, so that the request for ‘dans’ may return an occurrence such as ‘Erreur dans la num´ erotation de l’´ edition’ (‘er- ror in the edition numbering’). Some decades are also strongly unbalanced in the available texts. For instance, among the 2.7 million words of decade 1551-1560, more than one third of them comes from the works of a sin- gle author, Jean Calvin (references E198, B022, R849 to R852). Another bias comes from the fact that drama pieces, up to the end of the Modern Era, were making use of represented orality [31] much more than literary texts, so that many new constructions appear in them before spreading among the other texts. This would not be a problem if the proportion of dramas were more or less constant across the decades, which is not the case.

This problem vanishes in more recent times, when rep- resented orality appears also frequently in novels, while drama becomes itself more sophisticated and shifts fur- ther away from daily language.

Frantext is not only a database. It comes also with built-in text-mining algorithms which allow to submit very refined queries to the database. Such queries can make use of booleans and a given number of blank words. For instance, the query (` a|a) &q(1,2) (ins¸ cu|insu|insceu) (&q(1,2) is a blank slot for any one or two words) will retrieve occurrences such as ` a l’insu, ` a leur insu, but also ` a son propre insu. This kind of flexible requests are especially relevant when one is looking for specific constructions with a filling slot, as the corresponding possibilities cannot be exhaustively predicted. We studied for instance the construction d’une voix + ADJ. If we cannot list all adjectives, we can rule out all the parasite occurrences with an elaborated request such as ˆ(tous|receus) d’une voix ˆ(que|qui|qu’|et|ensemble|trestous|de|d’|vous|le|la|

les|par|pour|dont|-|.|;|,|:), where ˆ and | respectively stands for the booleans ‘not’ and ‘or’. Such a request makes it possible to capture unexpected adjectival constructs such as toute chang´ ee, si peu effroy´ ee or extraordinairement rauque et rouill´ ee, while discarding all spurious occurrences. Frantext also allows for special requests, for instance if one wishes to encompass several orthographic variations in a single query, for instance souventes?f* captures all possible variants of souven- tesfois, such as souventeffoiz, souvente fois, souventez fois, souventefoys, etc. This kind of elaborations prove to be all the more useful in the first stages of the evolution, where a functional construction has not yet become entrenched into an idiomatic form and can still be found in a high diversity of variants.

Once a request is submitted to the database, Frantext returns a datafile whose contents may vary according to the needs of the user. Depending on the options one chooses, the file displays, for each text, the text reference, the publication year, and the total number of occurrences of the query in that text. Next to this automatized proce- dure, we can also look across all individual occurrences in their context, as a sanity check. This was used frequently to help refining our queries. Unfortunately, it was impos- sible to ask Frantext for a file providing the statistics of the corpus itself, listing the number of occurrences per text reference. We extracted this information from an HTML page which does display this information (Cor- pus de travail > Visualiser). The data file provided by Frantext was then directly treated by our own algorithm to compute average frequencies for each decade.

A note on French

We acknowledge that we restricted ourselves to in- stances of semantic expansions in French, a choice which may appear to restrict the scope of our findings. As we argue in the main text, we believe this is not the case.

In the following, we stress, 1 - the necessity to conduct

the analysis on a long timescale (i.e. long enough so

that we can consider the language to have changed dur-

(13)

FIG. 8. Number of millions of occurrences per decade in the Frantext database. Exponential fit is shown by a red line.

ing that period, just as contemporary French has drifted sufficiently away from Middle French (XIV

^th

century) so that, without specific training, the latter is only partially intelligible to speakers of the former), 2 - that few corpora are as efficient as Frantext to achieve such a goal.

Given the issues addressed in this paper, it appears important to consider instances taken from a large time period (seven centuries in our case). Indeed, a frequently asked question is whether or not recent technological ad- vances (radio, TV, the Internet) have had an influence on the way language changes. Sociologically, this influence is obvious: Languages tend to homogenize over greater geographical areas and dialects have constantly declined throughout the twentieth century. Yet, the pattern of change of an established language is something entirely different. Our statistical survey shows that the pattern of change is the same, no matter in which century it may happen. It is furthermore consistent with recent findings establishing that the rate of change did not increase in the most recent decades [32]. It also goes along our claim that the pattern we exhibit is cognitively driven by mem- ory retrieval and conceptual organization, two cognitive mechanisms that the most recent technological evolutions could not have significantly altered.

Alas, finding appropriate corpora covering a long time period in a given language is not obvious. As discussed in this SI, section S5, Google Ngram cannot be used for texts earlier than the nineteenth centuries, since the scan- ning procedure does not lead to reliable digital data. For the English language, the reputed British National Cor- pus restricts itself to the twentieth century. The Helsinki Corpus spans a time period suited for our purposes, but the texts are too sparse (450 in total) for the cor- pus to be fitted for a statistical survey. The CORDE

corpus, in Spanish, spans several centuries (XIII

^th

to XX

^th

), and gathers an impressive amount of data as well (250 M words), but it covers different variants of Span- ish (Argentinian, Colombian, Castillan, etc.) which can- not be blended together when it comes to investigate se- mantic expansions (note that CORDE dutifully offers to treat them apart, but then the database is not extensive enough for each of the variant separately). The query- ing system also suffers from serious limitations, and it is not possible to submit complex queries as is the case with Frantext. This latter database is therefore truly re- markable in many aspects and has to be considered an exception. We thus leave to further studies the case of other languages.

A last remark is in order: We deliberately do not pro- vide any translation of the studied forms (SI, Table S1), however obscure they may appear to the reader. Indeed, these forms have all undergone a semantic expansion, so that a translation would be most mistaking as it would concern only one among several meanings adopted by the form. The only satisfying way of glossing the items we studied would have been to find forms which not only have the same meaning, but have also undergone (at least roughly) the same meaning shifts, as in the case of any- way and de toute fa¸ con for the later stages of their re- spective semantic evolutions. Obviously, this would have been possible only for a handful of cases, and we chose to leave the items without translation.

Appendix B: Model variants

The model we propose in the main body of the paper

describes a mechanism associated with language produc-

(14)

tion: It is solely based on a speaker perspective. Yet, language change may not come only from innovation in producing language, but also in understanding it. Actu- ally, these two aspects cannot be separated: If an inno- vation is possible in a speaker perspective, it must also be accessible from a hearer perspective. Be it a speaker or a hearer, a language user relies on the same cogni- tive entity. It seems thus necessary to consider model variants where the novelty can come from this comple- mentary perspective, as well as from a combination of the two.

1. Hearer variant

Let us consider the same situation as for the listener model: There are two meanings, C

0

and C

1

, to which are attached a pool of memories of linguistic tokens. Ini- tially, C

₀

is populated by X tokens only, while C

₁

is pop- ulated by Y tokens only. Just as context C

₁

is fed by the memory of C

0

when it came to express C

1

, if a linguistic occurrence yields meaning C

0

, it can elicit meaning C

1

as well. Occurrences of X thus have a chance to populate context C

₁

, so that we will note x the proportion of X to- kens in C

₁

, just as we did in the speaker-based model. If we ascribe to the inference C

0

⇒ C

1

a probability equal to γ, then we can describe the dynamics as follows:

1. Either C

0

or C

1

are chosen to be expressed, with equal probabilities.

2. If C

₀

has been chosen, X is produced. If C

₁

has been chosen, X is produced with probability P

0

(x), otherwise Y is produced. P

0

(x) is the same func- tion as P

γ

(x), except that γ is now set to 0 (there is no such thing as an effective frequency in this framework).

3. The produced occurrence is recorded in the chosen context. If C

0

has been chosen, an additional oc- currence of the same kind as the previous one is recorded in C

₁

with probability γ (C

₀

has elicited the meaning C

1

).

4. A past occurrence is deleted whenever needed, so as to keep both memory sizes constant.

These dynamics correspond once more to a random walk where the jump probabilities, forward and back- ward, respectively R

^H

(x) and L

^H

(x) (where H stand for

‘hearer’), are given by:



 

 

 

 

R

^H

(x) = 1

2 [γ + P

₀

(x)] (1 − x) L

^H

(x) = 1

2 (1 − P

0

(x))x

, (B1)

to be compared with the jump probabilities in the speaker perspective (respectively L

^S

(x) and R

^S

(x) for

the forward and backward jump probabilities):



 

 

R

^S

(x) = P

γ

(x)(1 − x) L

^S

(x) = (1 − P

γ

(x))x

. (B2)

These modified jump probabilities lead to a new ex- pression for the drift velocity:

˙ x = 1

2 [P

0

(x) − x + γ(1 − x)] . (B3) A change of variable y = (1 + γ)x − γ leads to the same equation as equation 5 of the main paper, with a slightly different timescale accounting for the fact that two con- texts are now being called:

2 1 + γ y ˙ =

P

₀

y + γ 1 + γ

− y

. (B4)

Indeed, P

0

y+γ 1+γ

is exactly P

γ

(y), so that the fixed point in the hearer perspective x

^H_c

will be given, as a function of the fixed point x

^S_c

of the speaker perspective, as:

x

^H_c

= x

^S_c

+ γ

1 + γ , (B5)

which is higher than x

^S_c

. This means that, in the hearer pespective, the latency frequency will also be higher.

However, it does not entail that the change will be more or less likely to happen, since what triggers the change is the fact that γ is equal to γ

c

or higher, and this pa- rameter γ

_c

remains the same throughout the perspective shift.

2. Combined model

We can now combine the Listener and Hearer perspec- tives, by taking into account the effective frequency f instead of the actual frequency x in step 2 of the dy- namics outlined in the previous subsection. Then, in the above formulae, all P

0

(x) become P

γ

(x) (or equivalently, P

0

(f )). The velocity is now set to:

˙ x = 1

2 [P

γ

(x) − x + γ(1 − x)] . (B6) Setting X = (x + γ)/(1 + γ), we get:

2(1 + γ) ˙ x = P

₀

(X) − X + (1 − X)γ(2 + γ) . (B7) We can now define a renormalized parameter ˜ γ = γ(2+γ) to make this velocity similar to the one given by (B3).

Setting Y = (1 + ˜ γ)X − ˜ γ, we finally get:

2 1 + γ 1 + ˜ γ

Y ˙ = P

˜γ

(Y ) − Y . (B8)

(15)

This implies that (Y

C

, ˜ γ

c

) = (x

^S_c

, γ

_c^S

), so that the critical point (x

^T_c

, γ

_c^T

) in this combined perspective is equal to:

(x

^T_c

, γ

_c^T

) =

x

^S_c

+ γ

_c^S

1 + γ

_c^S

,

q

1 + γ

^S_c

− 1

. (B9) In this case γ

_c^T

is lower than its hearer and speaker per- spectives counterparts. It entails that the change would happen more easily. x

^T_c

is somewhere in between x

^S_c

and x

^H_c

.

3. Summary

All three variants of the model give rise to the same picture of sigmoidal growth preceded by a period of la- tency. The data does not allow to discriminate between either one of these three possibilities. Yet, the hypothe- sis that the change is driven by both hearer and speaker mechanisms is the most probable, as all language users adopt the role of hearer and speaker alternatively. An enthralling perspective of research would be to devise a quantitative criterion so as to see which of the three mechanisms best account for real language data. One could also investigate which features of language change speaker and hearer perspectives are respectively able to account for independently, and if some features need the conjunction of both to appear. Obviously, all those ques- tions hinge upon available data and the finding of rele- vant observable quantities to look at.

4. Interpretations of the cognitive strength γ In the proposed model, we make the assumption that all memory sizes are equal in the speaker perspective, and that all meanings C

i

are expressed with equal probability in the hearer perspective. Here we consider the alterna- tive that the links in the network are not weighted: They are either 1 or 0. The asymmetric structure between the two contexts C

0

and C

1

is however maintained.

a. Heterogeneous memory sizes

Now let us assume different memory sizes for the two concepts, denoting by m and M the memory sizes of C

₀

and C

1

, respectively. Then the effective frequency of X in C

1

is given by:

f = N + m

M + m = x + m/M

1 + m/M (B10)

By defining γ as the ratio of memories m/M , we recover the same effective frequency as before.

This means that the strength γ of the cognitive link can be interpreted as a ratio between memory sizes. If all sites were connected to each other, the occurrences

expressing the contexts whose associated memory is the greatest would spread all over the network. However, not all sites lead to all others: There are pathways in the conceptual organization, which constrain possible seman- tic changes and allow for low-memory contexts to invade higher-memory ones.

The main difference brought forth by this interpreta- tion is that it allows for γ’s greater than one. In gen- eral, there would be no critical behavior and thus no latency, except if the conquering occurrence type comes from a very low memory context. This would suggest that, as grammaticalizations are well-characterized by the latency-growth pattern with sigmoidal increase, lexi- cal meanings are allocated a much smaller memory than grammatical ones. However, it would also be the case within the lexicon, when a word goes from a concrete meaning to an abstract one.

It is not clear why functional and abstract meanings should be allocated a greater memory than concrete meanings. There could be for instance some advantage in making the more abstract and structural part of the con- ceptual realm more stable in their linguistic expression than other parts of speech, especially because they serve to constrain the processing of utterances and provide structure to the flow of speech. Were it the case, then we could understand the strong asymmetry evidenced by grammaticalization — the fact that lexical forms are re- cruited to express grammatical meanings overwhelmingly more frequently than the reverse. Indeed, if the links were from the stable (i.e. supported by a large mem- ory size) to the unstable parts of the language, then all those links would be associated to a very high γ parame- ter, so that all parts of language would soon come to be expressed by the grammatical forms. This would right away lead to a complete communicative failure. There would thus be an obvious advantage in preventing the links from grammatical concepts to lexical ones, hence in the unidirectionality exhibited by grammaticalization.

b. Different probabilities of use

We now introduce different calling probabilities for C

0

and C

₁

in the hearer perspective. Let’s say that the prob- ability to call C

₀

is α. Here again γ is set to 1 (i.e. C

₀

automatically entails C

1

). The jump probabilities be- comes thus:

R

^H

(x) = [α + (1 − α)P

0

(x)] (1 − x) (B11) and:

L

^H

(x) = (1 − α)(1 − P

₀

(x))x . (B12)

We can factorize R

^H

(x) by 1 − α. Then we recover the

same computation as before, with the ratio of calling

probabilities α/(1 − α) playing the role of γ. Further-

more, if we set the call probability to be proportional to

memory size, then we recover the same γ as in the pre-

ceding subsection. This assumption seems natural, since

(16)

greater memory sizes would help stabilizing the linguistic expressions of widely used meanings.

In such case, the near-criticality associated to the latency-growth pattern is recovered only if the links in the conceptual network are from the seldom called contexts to the often called contexts (so as to insure low enough values of γ). This seems a natural assumption for gram- maticalization phenomena, since functional meanings are much more frequently called than lexical ones. Such as- sumption remains of course to be carefully investigated.

These two interpretations of the cognitive link point in the same direction: In short, the links of the concep- tual network would be distributed so as to prevent highly frequent forms from invading the less frequent ones, i.e., to ensure linguistic diversity. The asymmetry evidenced by grammaticalization would thus be a consequence of the fact that the highly pervasive functional forms must be kept away from the lexical, referential, more context- specific forms. This puzzling unidirectionality could thus have been selected as a cognitive structure able to guar- antee a wide spectrum of possibilities in linguistic expres- sion.

5. Sociolinguistic interpretation

We can give our model a completely different inter- pretation, taking a sociolinguistic view point. Instead of sites C

0

and C

1

, one considers two separate communities of speakers, C

₀

and C

₁

. Different tokens represent now different individuals, who make binary choices between either variant X or variant Y . The different community sizes, m and M , are then the analogous of the different memory sizes. The fact that C

0

influences unilaterally C

1

may be understood as the fact that community C

₀

has some prestige compared to C

₁

, so that C

₁

members lis- ten to C

0

members while the reverse does not hold. Sim- ilarly, different call frequencies may represent different representations in society — people from prestige com- munities being given media visibility to the exclusion of

the other communities. With this purely sociolinguistic interpretation, the model formalism thus remains exactly the same. Note that this point of view is akin to the one defended in [18].

In this interpretation, however, the model does not explain why the prestige community C

0

adopted X in the first place; nor does it explain the regularities in se- mantic change. Another point in which this interpreta- tion weakens is the timescale. Linguistic change can be very slow, taking up to several centuries, as shown in our corpus study. Is it reasonable to presume that the social structure holds and remains the same throughout centuries? On the contrary, some aspects of conceptual structure happen to be extremely stable, as they are both deeply constitutive of a culture, e.g. through entrenched metaphors [27], and due to the generic cognitive features of the mind (expressing time relations through spatial ones [25], for instance). As it happens, metaphors prove to be very stable, even if the reasons for this stability are still unclear. The astonishing persistence of myths schemata through the ages [33] is another hint of the re- markable resilience of human cultural features.

Appendix C: Boundaries of the trap region The analytical definitions, used to compute the latency and growth times in the model, are based on first pas- sage times. In this section we outline the procedure to compute these times

1. Analytical computation of mean first passage times

Let us note T

n→m

the first passage time at site m, starting at site n, 0 ≤ n, m ≤ M . This is a random vari- able for which one can write down a recursion equation for its generatrix function:

e

^λT^n→m

= R

n

D

e

^λ(T^n+1→m⁺¹⁾

E + L

n

D

e

^λ(T^n−1→m⁺¹⁾

E

+ (1 − L

n

− R

n

) D

e

^λ(T^n→m⁺¹⁾

E

, (C1)

where R

_n

and L

_n

are, respectively, the forward and backward jump probabilities, and h.i denotes the aver- age. We recall that n = 0 is a reflecting boundary (L

0

= 0, R

0

> 0), and n = M an absorbing boundary (R

_M

= L

_M

= 0). We have T

_n,n

= 0, and for the left boundary condition, that is for n = 0:

e

^λT^0→m

= R

0

D

e

^λ(T^1→m⁺¹⁾

E

+ (1 − R

0

) D

e

^λ(T^0→m⁺¹⁾

E . (C2) The first and second derivatives of this equation (C1) with respect to λ, at λ = 0, leads to recurrence relations

for the first and second moment of T

_n→m