• Aucun résultat trouvé

Semantic Changes in Apparent Time

N/A
N/A
Protected

Academic year: 2021

Partager "Semantic Changes in Apparent Time"

Copied!
10
0
0

Texte intégral

(1)

HAL Id: halshs-00802034

https://halshs.archives-ouvertes.fr/halshs-00802034

Submitted on 18 Mar 2013

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Jean-Philippe Magué

To cite this version:

Jean-Philippe Magué. Semantic Changes in Apparent Time. 32nd Annual Meeting of the Berkeley Linguistics Society, 2006, Berkeley, United States. �halshs-00802034�

(2)

JEAN-PHILIPPE MAGUÉ

Université Lumière Lyon 2/University of Chicago

0. Introduction

Semantic changes have been scientifically studied for more than 150 years (Ner- lich 1992). All along this history, successive generations of scholars have adopted at least three different theoretical frameworks (Magué 2005). Chronologically, the first trend focused on the identification of the different kinds of semantic changes a lexeme can undergo. This taxonomist trend culminates with Ullman (1962). The second trend adopts a typologist point of view and is characterized both by the ad- vocacy of cross-linguistic studies and the focus on semantic field rather than isol- ated lexemes. A typical work in the trend is Viberg (1983). Finally, a cognitivist trend has more recently emerged, which aims at explaining the cognitive mechan- isms that underlie semantic change (e.g., Sweetser 1990).

Despite the great variety of theoretical approaches the study of semantic changes has gone through, the methodologies used have surprisingly remained the same: only completed semantic changes are studied, either by the analysis of syn- chronic manifestations, i.e., polysemy or sets of cognates, or by the analysis of the development of a new meaning from corpus evidences. What makes this fact even more surprising is that, on the other hand, the study of phonological changes has undergone a methodological revolution (which has entailed theoretical break- throughs) during the last 40 years with the emergence of the Labovian variationist sociolinguistics (Labov 1963, 2001).

Sociolinguistics studies the correlation between linguistic variation and socio- economic factors. Among those factors, age of the speaker is of particular interest.

Assuming the Apparent Time Hypothesis (Bailey et al. 1991), which holds that speakers acquire their idiolect mainly during a critical period in their childhood, correlation between age and linguistic variation is the synchronic manifestation of a change in progress. Most of ongoing linguistic changes observed that way are phonological changes (Labov 1963), few are morphosyntactic ones (Parrott 2002), but, to our knowledge, semantic changes have remained left aside from variationist sociolinguistics. A possible explanation for this state of affairs lies in methodological difficulty to measure precisely and objectively enough the se- mantic variation. While phonetic variation is directly observable from speakers’

(3)

productions, since sounds are precisely the public part of linguistic communica- tion, meanings are mental entities and are not directly made public during com- munication. The experimenter who wishes to study the semantic variation faces thus the double challenge of obtaining an objective representation of the private mental meaning and of measuring semantic variation from this representation.

The goal of this paper is to present a method to achieve this double challenge.

The method presented here is based on the work in the field of quantitative an- thropology of Romney et al. (2000) which addresses the issue of inter-cultural dif- ferences in the representation of various cultural domains. The main idea it relies on, is to apply statistical treatment to semantic similarity judgments performed by speakers between words belonging to a same semantic field.

1. Materials and Method

The semantic field we analyzed was built with the French word maison ‘house’

and 20 of its synonyms1 chosen for their high frequency variation between the first and the second halves of the 20th century (Table 1). Subjects were given a questionnaire, presenting pairs of words followed by a 10cm-long axis. Each of the 210 possible pairs was presented once to each subject, and the order of the words within the pairs was counterbalanced between subjects. The order of the pairs was randomized for each subject. Subjects had to judge the semantic simil- arity of each pair of words by placing a mark on the corresponding axis: on the left extremity for unrelated words, on the right one for perfect synonyms and on intermediate positions for intermediate semantic similarities. The position of the marks were measured and scaled to lie between 0 and 1. The answers of each sub- ject i were then represented by a 2121 symmetric matrix Ai. Few subjects skipped some of the pairs. To deal with the missing values, all the subsequent ana- lyses were performed on the matrices Mi = (corr(Ai) + 1) / 2, where corr(Ai) is the correlation matrix of Ai. This operation had also the effect of removing noise from the data, since in the matrix Mi, the similarity between two words is given by the similarity judgment patterns of those two words against all others.

The experiment was performed on two groups of native French speakers, dif- fering only on their mean age. The younger group was composed of 47 subjects (36 females, i.e., 76.6%) with a mean age of 21 years ( 1.4, min = 17, max = 26). The older group was composed of 16 subjects (11 females, i.e., 68.8%) with a mean age of 56 years ( 3.1, min = 49, max = 63). While the two groups differed in terms of age (t(61) = 33.98, p < 10-15), they were matched in respect to gender (χ2(1,N=63) = 0.388, p > .5). In order to match the groups in their level of com- petence in French, subjects were asked the number of pages written in French they read per day. In the younger group, subjects read on average 39.1 pages while in the older group subjects read 25.2 pages (t(44) = 0.91, p > .35).

(1) The 21 words used in the experiment, their frequencies during the first and second halves of the 20th century and the variation of their frequency

1 Obtained from the online synonyms dictionary http://dico.isc.cnrs.fr

(4)

between the two periods. Frequencies were obtained from the online dic- tionary Le Trésor de la Langue Française Informatisé.2

Frequency 1900 -1950 (per million)

Frequency 1950 - 1999 (per million)

Frequency variation

(%)

case ‘hut’ 982 1,352 +38

chalet ‘chalet’ 353 1,106 +213

château ‘castle’ 10,743 5,802 -46

chaumière ‘thatched cottage’ 1,236 314 -75

clinique ‘private clinic’ 417 1,038 +149

construction ‘construction’ 3,442 4,811 +40

entreprise ‘enterprise’ 3,032 6,104 +101

établissement ‘establishment’ 1,930 2,999 +55

firme ‘firm’ 56 3,441 +6,045

habitation ‘dwelling’ 1,081 659 -39

hôpital ‘hospital’ 2,707 3,922 +45

immeuble ‘building’ 784 1,135 +45

intérieur ‘interior’ 15,514 24,662 +59

logement ‘accommodation’ 1,456 995 -32

logis ‘home’ 3,279 1,582 -52

maison ‘house’ 63,774 50,652 +21

manoir ‘manor’ 459 285 -38

masure ‘hovel’ 749 399 -47

propriété ‘property’ 8,326 4,764 -43

réduit ‘cubbyhole’ 381 531 +39

résidence ‘residence’ 424 574 +35

2. Results

2.1. Direct Analyses

When all the judgments are considered together, the older group judged the words significantly more similar, albeit very slightly, than the younger one. Their aver- age scores were respectively 0.58 and 0.56 (t(27781) = 4.64, p < 10-5). Consider- ing each pair of words individually, four of them presented this difference:

chalet / chaumière, château / clinique, château / entreprise, château / établisse- ment, and château / hôpital (all p’s < 0.01). Nevertheless, château / intérieur and immeuble / réduit were judged more similar by the younger group than by the older one (p’s < 0.01).

2.2. Principal Components Analysis

Following Romney et al. (2000), we have performed a Principal component ana- lysis (PCA) in order to obtain for each subject a semantic space in which words

2 http://atilf.atilf.fr

(5)

are represented by points, and in order to obtain the semantic similarity between two words by the geometric distance between the points representing them. The percentage of variance explained by each component is given in figure 2. Given those percentages (the first and the second component explaining respectively 55% and 17 % of the variance), only the two first components will be considerate in the rest of the paper. Before focusing on inter-generational differences in the semantic spaces, it is worth considering the common semantic space.

(2) Percentage of variance explained by the components. The two first explain together 72% of the total variance.

2.2.1. Common Semantic Space

In the common semantic space the position of the words is the average position across all the subjects, regardless of their group (figure 3). Along the first com- ponent, words spread from entreprise, firme, hôpital, clinique to logis, logement, habitation. The first component thus discriminates words along the habitability of their referents. Along the second component, words spread from château, pro- priété, manoir to masure, case, réduit, making this component axis of quality of the lodging.

In addition to these axes of habitability and quality, this semantic field is or- ganized by an ellipse (Figure 4, solid line) along which most of the inter-individu- al variability spreads (Figure 4, dotted lines). As we shall see in the next section, semantic changes occur along this ellipse too.

(6)

(3) Common Semantic Space. The first component describes the habitability of the referent of the words, and the second one the quality of lodging.

(4) Elliptical organization of the semantic field (solid line) and confidence el- lipses (dotted lines) at σ/2 with their main axis. Those axes tend to be tan- gent to the large ellipse structuring the semantic field, indicating that the inter-individual variability spreads along it.

(7)

2.2.2. Inter-Generational Differences

A preliminary way to identify inter-generational differences is to compare the mean position of the words between the two groups (figure 5). For most of the words, the change occurs along the ellipse. This change is statistically significant (Hotelling T2 test) for four of the words: château (T2 (T2)(2) = 18.88, p < .005), clinique (T2(2) = 7.45, p < .05), entreprise (T2(2) = 7.00, p < .05), and im- meuble (T2(2) = 6.69, p < .05). The change for hospital also tends to be signific- ant (T2(2) = 5.88, p = .063).

(5) Local inter-generational differences. The mean position of the words is represented for each of the two groups. This position is statistically differ- ent for château, clinique, enterprise, and immeuble.

Inter-group differences serve as an alternative to the search for local differ- ences in the semantic spaces of the subjects. To explain the inter-groups differ- ences is to look at differences in the global organization of the semantic field. To address this issue, Romney et al. (2000) proposed that the semantic space of each subject should be represented by the shape of the configuration of the points, which is fully encoded in the set of the 210 distances between all pairs of points.

This set of distances is given for each subject s by a 21 21 matrix Ds = (Dsij), where Dsij is the distance between the points corresponding to the words i and j in her semantic space. This matrix being symmetric, each subject s can be represen- ted by the vector ds built with the value above the diagonal of Ds, i.e. a point in a 210 dimensions subject space. In order to extract some information from the re- partition of the subjects in this space, its dimensionality can be reduced by per- forming a second PCA (figure 6). From the results of this PCA, it is then possible to quantify the impact of the socio-economic factors on the semantic variation, by

(8)

identifying the component associated with each of the factors. Figure 6 gives the percentage of variance explained by the first 50 components after the PCA is per- formed on the subject space. The second component discriminates the two groups of subjects (t(61) = 2.02, p < 0.05). The 13% of the variance this component ac- counts for can thus be attributed to the age of the speakers. We observed that gender and number of French pages read per day has little influence on the se- mantic variation. Subjects are discriminated according to their gender by the 17th component (t(61) = 1.95, p = 0.06), which explains 1.23% of the variance, and the number of pages read per day is correlated with the 42nd component (r(61) = 0.45, p < 0.005), which explains 0.14% of the variance.

(6) Percentage of variance explained by the first 50 components after a PCA on the subject space.

3. Discussion

The method we have exposed in the previous sections allows us to measure se- mantic variation among speakers. We show a correlation between this variation and the age of the speakers which is, under the apparent time hypothesis, the syn- chronic manifestation of a change in progress. While most of the literature on se- mantic change deals with changes of metaphorical or metonymic nature, the change we observe is rather a change in the internal relationship between the lex- emes of the semantic field of lodging in French. This change is more closely re- lated to the change described by Trier (1931), who studied (from corpora) the re- organization of the semantic field of knowledge in Middle High German during the 13th century. In 1200, this semantic field was organized around three lexemes:

Kunst ‘courtly, chivalric attainments’, List ‘non-courtly attainments’ and Wîsheit

‘human wisdom in all its respects, theological and mundane’ (English glosses are from Traugott & Dasher (2002)). This organization reflected the feudal structure of the German society at this time. One century later, the society was no longer feudal and the semantic field of knowledge had been reorganized in consequence:

List had moved out of the field and acquired its modern meaning ‘cunning, trick’

while Wizzen had moved in the field. Nevertheless, it was not a mere substitution.

Wîsheit had come specialized in religious knowledge, Kunst in artistic knowledge, while Wizzen covered technical knowledge. Figure 7 is schematic representation of the change.

(9)

(7) Reorganization of the semantic field of knowledge in Middle High Ger- man between 1200 and 1300 (from Lehrer, 1985).

Wîsheit

Wîsheit Kunst Wizzen Kunst List

1200 1300

While Trier (1931) analyzed a completed semantic change from corpora, the method we proposed allows observing such a intra-semantic field change during its realization.

This method allows us to quantify the influence of age on semantic variation at 13%. This indicates that factors other than speakers’ age determine this vari- ation. The other data we gathered about subjects (i.e., gender and the number of pages read per day), seem to account respectively only for 1.23% and 0.14% of the variation. But given the high number of components (210), random noise can be expected to produce similar results. Thus, we cannot conclude the influence of both factors on the semantic variation. On the other hand, many components, and thus a large part of the variation, remain uninterpreted, in particular the first and the third components which account respectively for 25% and 11% of the vari- ation.

The correlation between semantic variation and age may have another origin other than that of ongoing semantic change. The apparent time hypothesis has never been verified for semantics, and thus we cannot exclude that speakers modi- fy their semantic structure of the semantic field as they get older, such that the younger group would have in 30 years the structure observed today for the older group. Nevertheless, it is not clear why such an age-grading phenomenon would occur. Moreover, the high frequency variations of the lexemes used in our study (Table 1) are clues of a change. It seems then more likely that the semantic vari- ation observed in our study reflect a change in progress rather than an age-grading phenomenon. Ultimately, this would be confirmed by real time studies.

4. Conclusion

Variationist sociolinguistics investigates the relationships between linguistic vari- ation and the many factors that structure a population and relates this linguistic variation with ongoing linguistic change. Yet, sociolinguistics has so far focused almost exclusively on sound changes. In particular, semantic changes and socio- linguistics have remained two disconnected domains. Postulating that one pos- sible reason is the difficulty to observe and measure the semantic variation in the population, this paper has introduced a way to fill this methodological gap. We have applied this method to the semantic field of lodging in French and showed that the variation in the semantic representations of this semantic field mirrors an ongoing internal reorganization, i.e., a semantic change. This study opens new perspective into the study of semantic changes, which can now be studied through the prism of variationist sociolinguist and thus beneficiate of its whole theoretical framework.

(10)

References

Bailey, Guy, Tom Wikle, Jan Tillery and Lori Sand. 1991. The Apparent Time Construct. Language Variation and Change 3: 241-164.

Labov, William. 1963. The Social Motivation of a Sound Change. Word, 19, 273–

309.

Labov, William. 2001. Principles of Linguistic change. Volume II: Social Factors.

Oxford: Blackwell.

Lehrer, Adrienne. 1985. The Influence of Semantic Fields on Semantic Change.

In: Fisiak, J. (ed), Historical semantics. Historical word-formation. Ber- lin: Mouton. 283 - 296.

Magué, Jean-Philippe. 2005. Changements Sémantiques et Cognition: Différentes Méthodes pour Différentes Échelles Temporelles. Ph.D. diss., Universtité Lyon 2.

Nerlich, Brigitte. 1992. Semantics Theories in Europe 1830–1930. From Etymo- logy to Contextuality. Amsterdam: John Benjamins Publishing Compagny.

Parrott, Jeffrey. 2002. Dialect Death and Morpho-Syntactic Change: Smith Island Weak Expletive it. In Papers from NWAV 30, eds. D.E. Johnson & T. Sanchez.

Philadelphia: U. Penn Working Papers in Linguistics.

Romney, A. K., Moore, C. C., Batchelder, W. H., & Hsia, T. L. 2000. Statistical Methods for Characterizing Similarities and Differences Between Semantic Structures. Proceedings of the National Academy of Sciences, 97(1), 518–523.

Sweetser, Eve. 1990. From Etymology to Pragmatics: Metaphorical and Cultural Aspects of Semantic Structure. Cambridge: Cambridge University Press.

Traugott, Elizabeth, & Dasher, Richard. 2002. Regularity in Semantic Change.

Cambridge: Cambridge University Press.

Trier, Jost. 1931. Der Deutsche Wortschatzim Sinnbezirk des Verstandes. Heidel- berg: Winter.

Ullmann, Stephen. 1962. Semantics. An Introduction to the Science of Meaning.

Oxford: Blackwell.

Viberg, Ake. 1983. The Verbs of Perception: A Typological Study. Linguistics, 21, 123–162.

Jean-Philippe Magué University of Chicago Department of Linguistics 1010 E 59th Street, Chicago, IL 60637 mague@uchicago.edu

Références

Documents relatifs

The method is tested as a means of identifying lexical units that can be added to existing frames or to new, related frames, using a large corpus on the environment.. It is

Case 1: s is a concrete node (URI or literal), or a variable that is already bound to a set of candidate nodes. In this case, the triple pattern is resolved in the

This paper presents a semi-automated semantic process for feature conflation that solves the type-matching problem using ontologies to determine similar feature

A key property of this machine understandable content is that it must provide for semantic interoperability between the various web pages.. In particular, centering

As yet, although there are some previous approaches for discovering and ranking semantic connectivity association, to our knowledge, there has been no proposal to discover and

We present an algorithmic approach for consistently generating URIs from textual data, discuss the algorithmic matching of author names and suggest how RDF

Thus our approach consists (1) in using the ontology representation available in an OBDB to interpret in a semantic way both the data manipulation language and the data query lan-

Thus, if either the base class or one of its super classes is visible within the external ontology, the semantic fact that this virtual class subsumes those classes must be