• Aucun résultat trouvé

Modelling semantic transparency in English compound nouns


Academic year: 2022

Partager "Modelling semantic transparency in English compound nouns"

En savoir plus ( Page)

Texte intégral


Copyright © by the paper’s authors. Copying permitted for private and academic purposes.

In Vito Pirrelli, Claudia Marzi, Marcello Ferro (eds.): Word Structure and Word Usage. Proceedings of the NetWordS Final Conference, Pisa, March 30-April 1, 2015, published at http://ceur-ws.org

Modelling semantic transparency in English compound nouns

Melanie J. Bell Anglia Ruskin University

Cambridge U.K.


Martin Schäfer Friedrich Schiller University

Jena Germany


1 Introduction

Semantic transparency is known to play an im- portant role in the storage and processing of complex words (e.g. Marslen-Wilson et al.

1994), and human raters of transparency achieve high levels of agreement (e.g. Frisson et al. 2008, Munro et al. 2010). In the case of noun-noun compounds, overall transparency is largely de- termined by the transparency of the individual constituents. For example, Reddy et al. (2011) showed that the perceived transparency of a compound is highly correlated with both the sum and the product of the perceived transparencies of its constituents. Furthermore, many psycho- linguistic studies find significant effects for se- mantic transparency using a four-way distinction based on perceived constituent transparency:

transparent-transparent (e.g. carwash), transpar- ent-opaque (e.g. jailbird), opaque-transparent (e.g. strawberry) and opaque-opaque (e.g. hog- wash) (Libben et al. 2003). Bell and Schäfer (2013) modelled the transparency of individual compound constituents and showed that shifted word senses reduce perceived transparency, while certain semantic relations between constit- uents increase it. However, this finding is prob- lematic in at least two ways. Firstly, it is not clear whether there is a solid basis for establish- ing whether a specific word sense is shifted or not. For example, card in credit card is clearly shifted if viewed etymologically, but may not synchronically be perceived as shifted due to its frequent use. Secondly, work on conceptual combination by Gagné and collaborators has shown that relational information in compounds is accessed via the concepts associated with indi- vidual modifiers and heads, rather than inde- pendently of them (e.g. Spalding et al. 2010 for an overview). This leads to the hypothesis that it is not whether a specific word sense is etymolog- ically shifted, nor whether a specific semantic relation is used per se, that makes a compound constituent more or less transparent; rather, it is

the degree of expectedness of a particular word sense and a particular relation for a given con- stituent. In this paper, we provide evidence in support of this hypothesis: the more expected the word sense and relation for a constituent, the more transparent it is perceived to be.

2 Method

We used the publicly available dataset described in Reddy et al. (2011), which gives human trans- parency ratings for a set of 90 compound types and their constituents (N1 and N2), and compris- es a total of 7717 ratings. To model the expect- edness of word senses and semantic relations for a given compound constituent, we used the con- stituent families of the compounds, which we extracted in a two step process. We took all strings of exactly two nouns that follow an article in the British National Corpus and which also occur four times or more in the USENET corpus (Shaoul and Westbury 2010). From this set, we extracted the positional constituent families for all constituent nouns in the Reddy et al. dataset, giving a total of 4553 compounds for the N1 families and 9226 for the N2 families. Each of these compound types was coded for the seman- tic relation between the constituents (after Levi 1978), and for the WordNet sense of the constit- uent under consideration (Princeton 2010). We then calculated the proportion of compound types in each constituent family with each se- mantic relation (relation proportion), and each WordNet sense of the constituent in question (synset proportion). We take these two measures to reflect the expectedness of the respective rela- tions and WordNet senses of the constituents: if a relation or sense occurs in a high proportion of the constituent family, it is more expected. These variables were used, along with other quantita- tive measures, as predictors in ordinary least squares regression models of perceived constitu- ent transparency. The final model for the trans- parency of N1 is given in Table 1:



3 Results

All predictors in our model enter into significant interactions, and these are shown graphically in Figure 1, where the contour lines on the plots represent perceived transparency of the first con- stituent (N1). The first plot shows an interaction between relation proportion and overall (log) family size: for small families, relation propor- tion plays little role, whereas for larger families, in accordance with our hypothesis, the transpar- ency of N1 increases with the proportion of the corresponding relation in the family. The second plot shows the interaction between the synset proportion and the total number of a constitu- ent’s senses (as listed in WordNet): only if there is a sufficient number of different senses in the family is their proportion a reliable predictor of semantic transparency. There is also a small but significant interaction between the log frequency of a constituent and the proportion of the constit- uent family (in terms of tokens) represented by the compound in question: this shows that trans- parency increases with frequency, but only in the lower frequently ranges does the proportion in the family play a role.

4 Conclusion

Overall, the model provides clear evidence for our hypothesis. N1 is rated as most transparent when it is a frequent word, with a large family, occurring with its preferred semantic relation and most frequent sense, and with few other senses to compete. We interpret the results as indicating that compound constituents are perceived as more transparent when they are more expected (both generally and with a specific sense) and when they occur in their most expected semantic environments. In information theory, the less expected an event, the greater its information content: in so far as perceived transparency is a reflection of expectedness, it can therefore also be seen as the inverse of informativity.


This work was made possible by three short visit grants from the European Science Foundation through NETWORDS - The European Network on Word Structure (grants 4677, 6520 and 7027), for which the authors are extremely grateful.

Coef S.E. t Pr(>|t|)

Intercept -4.6413 0.6593 -7.04 <0.0001

relation proportion in N1family -0.2187 0.6013 -0.36 0.7161

log family size of N1 -0.0189 0.0931 -0.20 0.8395

synset proportion in N1family -0.2426 0.6152 -0.39 0.6934

log synset count of N1 -0.7939 0.2469 -3.22 0.0013

compound proportion in N1 family (token-based) 3.0130 0.6788 4.44 <0.0001

log frequency of N1 0.8728 0.0569 15.34 <0.0001

relation proportion * log family size 0.3311 0.1305 2.54 0.0113 synset proportion * log synset count 0.6855 0.3161 2.17 0.0303 compound proportion * log frequency N1 -0.2804 0.0816 -3.44 0.0006

Table 1: Final model for the transparency of N1, R2 adjusted = 0.334

Figure 1. Interaction plots for N1 transparency

relation proportion in N1family

log family size of N1

2 3 4 5 6

0.2 0.4 0.6 0.8

2.5 3.0

3.5 4.0

synset proportion in N1family

log synset count of N1

1.0 1.5 2.0 2.5 3.0

0.2 0.4 0.6 0.8

1.0 1.5

2.0 2.5


compound proportion in N1 family (token-based)

log frequency of N1

6 8 10

0.2 0.4 0.6 0.8

0 1

2 3

4 5




Bell, Melanie J. and Martin Schäfer. 2013. Semantic transparency: challenges for distributional semantics. In Aurelie Herbelot, Roberto Zamparelli and Gemma Boleda eds., Proceedings of the IWCS 2013 workshop: Towards a formal distributional semantics, 1–10. Potsdam:

Association for Computational Linguistics.

Frisson, Steven, Elizabeth Niswander-Klement and Alexander Pollatsek. 2008. The role of semantic transparency in the processing of English com- pound words. British Journal of Psychology 991, 87–107.

Levi, Judith N. 1978. The syntax and semantics of complex nominals. New York: Academic Press.

Marslen-Wilson, William, Lorraine K. Tyler, Rachelle Waksler and Lianne Older. 1994. Mor- phology and meaning in the English mental lexi- con. Psychological Review 101, 1: 3-33.

Munro, Robert, Steven Bethard, Victor Kuperman, Vicky Tzuyin Lai , Robin Melnick, Christopher Potts, Tyler Schnoebelen and Harry Tily. 2010.

Crowdsourcing and language studies: the new generation of linguistic data. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk, pp. 122-130. Association for Computational Linguistics.

Princeton University. 2010. WordNet.


Reddy, Siva, Diana McCarthy and Suresh Manandhar.

2011. An empirical study on compositionality in compound nouns. In Proceedings of The 5th In- ternational Joint Conference on Natural Lan- guage Processing 2011 IJCNLP 2011, Chiang Mai, Thailand

Shaoul, Cyrus and Chris Westbury. 2010. An anonymized multi-billion word USENET corpus 2005-2010

http://www.psych.ualberta.ca/˜westburylab/downl oads/usenet.download.html

Spalding, Thomas L., Christina L. Gagné, Allison C.

Mullaly and Hongbo Ji. 2010. Relation-based in- terpretation of noun-noun phrases: A new theoret- ical approach. Linguistische Berichte Sonderheft 17, 283-315

Wurm, Lee H. 1997. Auditory processing of prefixed English words is both continuous and decompositional. Journal of Memory and Lan- guage, 37, 438–461.



Documents relatifs

A content-based recommender system that processes Semantic Web formats can, for example, show which concepts have been used to recommend certain items and which

This is because when workers are averse to inequality of contributions, the observability of ef- forts creates peer pressure: more first-round effort by a player creates pressure on

There are two sides to the hypothesis of the semantic transparency of utterances: the first one, according to which utterances are semantically transparent relative to the

Les systèmes verbaux sont décrits selon des principes linguistiques rigoureux : nous les classifions selon un système de bases de conjugaison pour ce qui est du français et

3 In the spring of 2012, I audited all patients in my practice with atrial fibrillation and assigned each a CHADS 2 score and a HAS-BLED score, asked whether they

2 Pour chaque figure ci-dessous, trace l'axe ou les axes de symétrie en t'aidant du quadrillage.. 3 Pour chaque figure ci-dessous, trace l'axe ou les axes de symétrie si elle

P6: The effect of users’ perceived understanding of AGSs on users’ attitudes towards AGSs, intentions of adopting AGSs, and acceptances of AGS outcomes will be moderated by

In this paper, we propose an architecture that provides an automated means to en- able transparency with respect to personal data processing and sharing transactions and

In requirements engineering literature, socio-technical systems (STSs) are typically seen as a set of inter-dependent social and technical actors, as is the case with goal modelling

Mechanisms providing such cellular memory of gene expression include histone modi- fications, as well as regulation mediated by microRNAs (miRNAs) [ 3 ], although we will mainly

14 Thus, we can conclude that for realistic parameter configurations (when the central bank’s information is at least as precise as that of firms, σ 2 η &lt; σ ε 2 ), transparency

110 – Valencia, El Puig de Santa María, Capital on the portal of the church of Saint Mary, 13th century, Sculpture workshop from the Crown of Aragon... 444 Illustrations Serra

2.1.1 111.1 – The system must provide the user with real time information on physical data storage and data storage location of different types of data The information used to

Transparency, a principle advocated by the General Data Protection Regulation, is usually defined in terms of properties such as availability, auditability and accountability and

Relevantly for this requirement, the privacy statement provides information separated into the following categories: Personal Data We Collect (DWC); How We Use Personal Data

The symbols were from one of 3 symbol sets, two designed by experts (the standard UML notation and the notation designed following Physics of Notations principles as explained in

Note that in the 2-Int-I treatment, individual embezzlement is not directly observable, neither by the recipient nor by the intermediaries (in particular, the second intermediary

Propulsion overload tests in open water combined with limited ice tests, and the IOT standard method for analyzing propulsion tests in ice, gave consistent results for

It is divided as follows: (1) transparency in the publication of information, particularly in the context of ongoing litigation; (2) the submission of amicus curiae briefs in

Compte tenu de nos efforts pour être un revue de calibre académique pour l’excellence en bioéthique, nous sommes ravis d'inaugurer une nouvelle section

According to this objection, the fact that it is impossible to find a possible world in which a metaleptic fiction would be truthfully told does not

This Section sets forth the state of play amongst the Companies’ current practices relating to transparency reporting on government requests for user data, presenting firstly

Using a fixed effects model that controls for hospitals’ medicine-specific bargaining abilities and medicine-specific price trends, I find that group purchasing reduces prices