• Aucun résultat trouvé

Computer-assisted approaches to semantic maps. A qualitative approach to large-scale lexical datasets

N/A
N/A
Protected

Academic year: 2021

Partager "Computer-assisted approaches to semantic maps. A qualitative approach to large-scale lexical datasets"

Copied!
89
0
0

Texte intégral

(1)

Thanasis Georgakopoulos & Stéphane Polis

(National Research University, Higher School of Economics, Moscow &

University of Liège / F.R.S.-FNRS)

Computer-assisted approaches to semantic maps

(2)

Outline of the talk

Ø (Classical) semantic maps o Basic principles

(3)

Outline of the talk

Ø (Classical) semantic maps o Basic principles

o Inferring semantic maps from co-expression matrices

Ø Tool: pros and cons

o Can we do better? (case-study: indefinites) o Unsolvable inferences (case-study: perception)

(4)

Outline of the talk

Ø (Classical) semantic maps o Basic principles

o Inferring semantic maps from co-expression matrices Ø Tool: pros and cons

o Can we do better? (case-study: indefinites) o Unsolvable inferences (case-study: perception)

Ø Datasets: what do we need? o Size (case-study: emotions)

(5)

Outline of the talk

Ø (Classical) semantic maps o Basic principles

o Inferring semantic maps from co-expression matrices Ø Tool: pros and cons

o Can we do better? (case-study: indefinites) o Unsolvable inferences (case-study: perception) Ø Datasets: what do we need?

o Size (case-study: emotion)

o Structure (case-study: allative markers)

Ø Method: can we open the black box? o Lattices (case-study: indefinites)

(6)

Outline of the talk

Ø (Classical) semantic maps o Basic principles

o Inferring semantic maps from co-expression matrices Ø Tool: pros and cons

o Can we do better? (case-study: indefinites) o Unsolvable inferences (case-study: perception) Ø Datasets: what do we need?

o Size (case-study: emotion) o Structure (case-study: allative) Ø Method: can we open the black box?

o Lattices (case-study: indefinites)

Ø Conclusions

(7)
(8)

Inferring semantic maps

• ‘A semantic map is a geometrical representation of functions (…) that are linked by connecting lines and thus constitute a network’

(Haspelmath 2003)

• A semantic map is a method for visually representing cross-linguistic regularity in semantic structure based on patterns of co-expression

(Georgakopoulos & Polis 2018)

FIGURE 1. A semantic map of typical dative functions /

(9)

Inferring semantic maps

• ‘A semantic map is a geometrical representation of functions (…) that are linked by connecting lines and thus constitute a network’

(Haspelmath 2003)

FIGURE 1. A semantic map of typical dative functions /

the boundaries of English to (based on Haspelmath 2003: 213)

Connectivity hypothesis

• A semantic map is a method for visually representing cross-linguistic regularity in semantic structure based on patterns of co-expression

(10)

Inferring semantic maps

• ‘A semantic map is a geometrical representation of functions (…) that are linked by connecting lines and thus constitute a network’

(Haspelmath 2003)

FIGURE 1. A semantic map of typical dative functions /

the boundaries of English to (based on Haspelmath 2003: 213)

Connectivity hypothesis

Economy principle

• A semantic map is a method for visually representing cross-linguistic regularity in semantic structure based on patterns of co-expression

(11)

Inferring semantic maps

“ideally (…) it should be possible to generate

semantic maps automatically on the basis of a

given set of data”

(12)

Inferring semantic maps

Limitation of the (classical) semantic map method: practically, it is

impossible to handle large-scale crosslinguistic datasets manually

“not mathematically well-defined or computationally

tractable, making it impossible to use with large and highly

variable crosslinguistic datasets”

(13)

Inferring semantic maps

o Dimensionality reduction

• Points = meanings (or contexts) • Proximity = similarity between

meanings (or contexts)

FIGURE 2. MDS analysis of Haspelmath’s (1997) data

on indefinite pronouns

(14)

Inferring semantic maps

o Dimensionality reduction

• Points = meanings (or contexts) • Proximity = similarity between

meanings (or contexts)

FIGURE 2. MDS analysis of Haspelmath’s (1997) data

on indefinite pronouns

(Croft & Poole 2008: 15)

1. Specific known

Somebody called you, guess who 2. Specific unknown:

(15)

Graphs vs feature projections

(16)

Graphs vs feature projections

(17)

Regier, Khetarpal, and Majid showed that the semantic map inference

problem is “formally identical to another problem that superficially

appears unrelated: inferring a social network from outbreaks of disease

in a population”

(Regier et al. 2013: 91)

(18)

• What’s the idea?

• Consider a group of social agents (represented by the nodes of a potential graph)

(19)

• What’s the idea?

• If one observes the same disease for five of these agents (technically called a constraint on the nodes of the graph)

(20)

• What’s the idea?

• One can postulate that all the agents met, so that all the nodes of the graph are connected (10 edges between the 5 nodes)

(21)

• What’s the idea?

• This is neither a very likely, nor a very economic explanation

(22)

• What’s the idea?

• But this is precisely what a colexification network does

Inferring semantic maps

(23)

• What’s the idea?

• But this is precisely what a colexification network does

Inferring semantic maps

(24)

• What’s the idea?

• The goal would be to find a more economical solution and to have all the social agents connected with as few edges as possible, but still accounting for all the observations

(25)

• What’s the idea?

• The goal would be to find a more economical solution and to have all the social agents connected with as few edges as possible, but still accounting for all the observations

(26)

• How does it transfer to semantic maps?

Inferring semantic maps

(27)

• How does it transfer to semantic maps?

• Nodes are meanings

Inferring semantic maps

Meaning 1 Meaning 2 Meaning 5 Meaning 4 Meaning 3 Meaning 1 2 3 4 5

(28)

• How does it transfer to semantic maps?

• Nodes are meanings

• Constraints are patterns of co-expression (connectivity hypothesis)

Inferring semantic maps

Meaning 1 Meaning 2 Meaning 5 Meaning 4 Meaning 3 Meaning 1 2 3 4 5 Polysemic item A √ √ Polysemic item B √ √ √ Polysemic item C √ √ √

(29)

• How does it transfer to semantic maps?

• Nodes are meanings

• Constraints are patterns of co-expression (connectivity hypothesis)

• One connects the nodes economically based on these constraints (economy principle)

Inferring semantic maps

Meaning 1 Meaning 2 Meaning 5 Meaning 4 Meaning 3 Meaning 1 2 3 4 5 Polysemic item A √ √ Polysemic item B √ √ √ Polysemic item C √ √ √

(30)

• How does it transfer to semantic maps?

• Nodes are meanings

• Constraints are patterns of co-expression (connectivity hypothesis)

• One connects the nodes economically based on these constraints, starting with the edge(s) that accounts for the most frequent constraint(s)

Inferring semantic maps

Meaning 1 Meaning 2 Meaning 5 Meaning 4 Meaning 3 Meaning 1 2 3 4 5 Polysemic item A √ √ Polysemic item B √ √ √ Polysemic item C √ √ √

(31)

• How does it transfer to semantic maps?

• Nodes are meanings

• Constraints are patterns of co-expression (connectivity hypothesis)

• One connects the nodes economically based on these constraints, starting with the edge(s) that accounts for the most frequent constraint(s), and then going down the scale

Inferring semantic maps

Meaning 1 Meaning 2 Meaning 5 Meaning 4 Meaning 3 Meaning 1 2 3 4 5 Polysemic item A √ √ Polysemic item B √ √ √ Polysemic item C √ √ √

(32)

• How does it transfer to semantic maps?

• Nodes are meanings

• Constraints are patterns of co-expression (connectivity hypothesis)

• One connects the nodes economically based on these constraints, starting with the edge(s) that accounts for the most frequent constraint(s), and then going down the scale

Inferring semantic maps

Meaning 1 Meaning 2 Meaning 5 Meaning 4 Meaning 3 Meaning 1 2 3 4 5 Polysemic item A √ √ Polysemic item B √ √ √ Polysemic item C √ √ √

(33)

• How does it transfer to semantic maps?

• Nodes are meanings

• Constraints are patterns of co-expression (connectivity hypothesis)

• One connects the nodes economically based on these constraints, starting with the edge(s) that accounts for the most frequent constraint(s), and then going down the scale

Inferring semantic maps

Meaning 1 Meaning 2 Meaning 5 Meaning 4 Meaning 3 Meaning 1 2 3 4 5 Polysemic item A √ √ Polysemic item B √ √ √ Polysemic item C √ √ √

(34)

• Regier et al. (2013) observed that the approximations produced by this algorithm (Angluin et al. 2010) are of high quality

• Tested on the crosslinguistic data of Haspelmath (1997) and Levinson et al. (2003)

Inferring semantic maps

Figure. Haspelmath’s (1997: 4) original semantic map of the indefinite pronouns functions

(35)

Inferring semantic maps

INPUT

(36)

Inferring semantic maps

INPUT

(lexical matrix)

ALGORITHM

(37)

Inferring semantic maps

INPUT (lexical matrix) ALGORITHM (python script) RESULT (semantic map)

(38)

Tool:

pros & cons

(39)

Tool: pros and cons

RESULT

(40)

Tool: pros and cons

RESULT

(semantic map)

FIGURE 4a. A simple semantic map of person marking

(Cysouw 2007: 231)

FIGURE 4b. A weighted semantic map of person marking

(41)

• Generate the map with a modified version of the algorithm of

Regier et al. (2013)

• PRINCIPLE: for each edge that is being added between two meanings

of the map, we know the number of constraints that it satisfy

(max_score in the #main loop), which can be used directly as weight for the edge.

G.add_edge(*max_edge, weight=max_score)

Tool: pros and cons

(42)

Tool: pros and cons

Towards weighted semantic maps

Automatically plotted semantic maps: non-weighted vs. weighted (data from Haspelmath 1997)

The graph is visualized in Gephi®with the Force Atlas

(43)

Tool: pros and cons

Towards weighted semantic maps

Automatically plotted semantic maps: non-weighted vs. weighted (data from Haspelmath 1997)

The graph is visualized in Gephi®with the Force Atlas

(44)

Tool: pros and cons

Towards weighted semantic maps

Automatically plotted semantic maps: non-weighted vs. weighted (data from Haspelmath 1997)

The graph is visualized in Gephi®with the Force Atlas

algorithm and modularity analysis (Lambiotte et al. 2009)

(45)

Tool: pros and cons

Unsolvable inferences

Meaning 1 2 3

Polysemic item A √ √ √

(46)

Tool: pros and cons

Unsolvable inferences

Meaning 1 Meaning 2 Meaning 3

Meaning 1 2 3

Polysemic item A √ √ √

Polysemic item B √ √ √

Meaning 2 Meaning 3 Meaning 1

(47)

Tool: pros and cons

Unsolvable inferences

Meaning 1 Meaning 2 Meaning 3

Meaning 1

Wood Tree2 Forest3

Polysemic item A √ √ √

Polysemic item B √ √ √

Meaning 2 Meaning 3 Meaning 1

Meaning 3 Forest Meaning 1 Wood Meaning 2 Tree

(48)

Tool: pros and cons

Unsolvable inferences

Meaning 1 Meaning 2 Meaning 3

Meaning 1

Wood Tree2 Forest3

Polysemic item A √ √ √

Polysemic item B √ √ √

Meaning 2 Meaning 3 Meaning 1

Meaning 3 Forest Meaning 1 Wood Meaning 2 Tree

Ø Qualitative semantic analysis Ø More typological data

(49)

Tool: pros and cons

(50)

Tool: pros and cons

Unsolvable inferences

smell hear

(51)

Tool: pros and cons

Unsolvable inferences

smell hear 11 listen 3

(52)

Tool: pros and cons

Unsolvable inferences

smell hear 11 listen 3 3

? ?

3

(53)

Tool: pros and cons

Unsolvable inferences

smell hear 11 listen 3 3 3 see

?

?

?

?

(54)

Tool: pros and cons

Unsolvable inferences

smell hear 11 listen 3 3 3 see

?

?

?

?

?

? ?

Ø More typological data ⇒ more constraints

(55)

Tool: pros and cons

Unsolvable inferences

(56)

Datasets:

what do we need?

(57)

Datasets: what do we need?

Size

Cf. Joshua Conrad Jackson, Joseph Watts, Teague Henry, Johann-Mattis List, Robert Forkel, Simon Greenhill, Russell Gray, Kristen Lindquist, Variability and Universality in Human Emotion Across 1156 Languages,

(58)

Datasets: what do we need?

Size

Emotions (properties) in CLICS

2

(colexifications in 1220 languages)

AMAZING ANGRY ASHAMED ASTONISHED BAD BEAUTIFUL BORING BRAVE CLEVER CONTEMPTIBLE CORRECT (RIGHT) CRUEL CUNNING DEAR DILIGENT DREADFUL EVIL EXACT FAITHFUL GENTLE GLOOMY GOOD GREEDY HAPPY HONEST IMPORTANT INSOLENT KEEN KIND OR POLITE LOVELY MERRY PASSIONATE PROUD RUDE SAD SHY SORROWFUL SURPRISED TRUE UGLY UNPLEASANT VULGAR WRONG Concepticon (https://concepticon.clld.org)

43

(59)

Emotions (properties) in CLICS

2

(colexifications in 1220 languages)

AMAZING ANGRY ASHAMED ASTONISHED BAD BEAUTIFUL BORING BRAVE CLEVER CONTEMPTIBLE CORRECT (RIGHT) CRUEL CUNNING DEAR DILIGENT DREADFUL EVIL EXACT FAITHFUL GENTLE GLOOMY GOOD GREEDY HAPPY HONEST IMPORTANT INSOLENT KEEN KIND OR POLITE LOVELY MERRY PASSIONATE PROUD RUDE SAD SHY SORROWFUL SURPRISED TRUE UGLY UNPLEASANT VULGAR WRONG Concepticon (https://concepticon.clld.org) ANGRY ASHAMED BAD BEAUTIFUL BRAVE CLEVER CORRECT (RIGHT) CUNNING DEAR DILIGENT EVIL FAITHFUL GENTLE GOOD GREEDY HAPPY HONEST MERRY PROUD SAD SHY SURPRISED TRUE UGLY WRONG

43

25 (attested)

Clics2(https://clics.clld.org)

Datasets: what do we need?

(60)

Emotions (properties) in CLICS

2

(colexifications in 1220 languages)

AMAZING ANGRY ASHAMED ASTONISHED BAD BEAUTIFUL BORING BRAVE CLEVER CONTEMPTIBLE CORRECT (RIGHT) CRUEL CUNNING DEAR DILIGENT DREADFUL EVIL EXACT FAITHFUL GENTLE GLOOMY GOOD GREEDY HAPPY HONEST IMPORTANT INSOLENT KEEN KIND OR POLITE LOVELY MERRY PASSIONATE PROUD RUDE SAD SHY SORROWFUL SURPRISED TRUE UGLY UNPLEASANT VULGAR WRONG Concepticon (https://concepticon.clld.org) ANGRY ASHAMED BAD BEAUTIFUL BRAVE CLEVER CORRECT (RIGHT) CUNNING DEAR DILIGENT EVIL FAITHFUL GENTLE GOOD GREEDY HAPPY HONEST MERRY PROUD SAD SHY SURPRISED TRUE UGLY WRONG

43

25 (attested)

ANGRY BAD BEAUTIFUL BRAVE CLEVER CORRECT (RIGHT) DEAR DILIGENT EVIL FAITHFUL GENTLE GOOD HAPPY MERRY PROUD SAD SURPRISED TRUE UGLY WRONG Clics2(https://clics.clld.org)

20 (colexified)

Clics2(https://clics.clld.org)

Datasets: what do we need?

(61)

Emotions (properties) in CLICS

2

(colexifications in 1220 languages)

ANGRY BAD BEAUTIFUL BRAVE CLEVER CORRECT (RIGHT) DEAR DILIGENT EVIL FAITHFUL GENTLE GOOD HAPPY MERRY PROUD SAD SURPRISED TRUE UGLY WRONG

20 (colexified)

Clics2(https://clics.clld.org) 8415, 96% 366, 4% Lexifications vs. colexifications Lexifications Colexifications

Datasets: what do we need?

(62)

Emotions (properties) in CLICS

2

(colexifications in 1220 languages)

ANGRY BAD BEAUTIFUL BRAVE CLEVER CORRECT (RIGHT) DEAR DILIGENT EVIL FAITHFUL GENTLE GOOD HAPPY MERRY PROUD SAD SURPRISED TRUE UGLY WRONG

20 (colexified)

Clics2(https://clics.clld.org) 8415, 96% 366, 4% Lexifications vs. colexifications Lexifications Colexifications 20 meanings 366 constraints 31 edges

Datasets: what do we need?

(63)

Emotions (properties) in CLICS

2

(colexifications in 1220 languages)

Datasets: what do we need?

(64)

Emotions (properties) in CLICS

2

(colexifications in 1220 languages)

Datasets: what do we need?

(65)

Emotions (properties) in CLICS

2

(colexifications in 1220 languages)

/sʊu͡ɔhtɑs/ in Lule Sami (Uralic) [traurig] and [lusag]

(northeuralex.org)

?Neutral valence and high arousal?

Datasets: what do we need?

(66)

Emotions (properties) in CLICS

2

(colexifications in 1220 languages)

Datasets: what do we need?

Size

/sʊu͡ɔhtɑs/ in Lule Sami (Uralic) [traurig] and [lusag]

(northeuralex.org)

(67)

Emotions (properties) in CLICS

2

(colexifications in 1220 languages)

Carapana (Tucanoan ; Amazonia)

Datasets: what do we need?

Size

(68)

Emotions (properties) in CLICS

2

(colexifications in 1220 languages)

Semantic map based on colexification patterns attested in more than 1 language variety

Datasets: what do we need?

(69)

Allative markers (based on Rice & Kabata 2007)

Datasets: what do we need?

(70)

Allative markers (based on Rice & Kabata 2007)

Datasets: what do we need?

Structure

Ø 34 meanings Ø 33 edges

(71)

Allative markers (based on Rice & Kabata 2007)

Datasets: what do we need?

(72)

Allative markers (based on Rice & Kabata 2007)

Datasets: what do we need?

Structure

Data should be structured around several

(73)

Method:

can we open the blackbox?

(74)

Formal Concept Lattices (hierarchical graphs)

Method: can we open the black-box?

(75)
(76)

Method: can we open the black-box?

FCA analysis of Haspelmath’s (1997) data

FCA solves the problem of form/meaning mapping

(77)

Method: can we open the black-box?

FCA solves the problem of form/meaning mapping, since it shows:

ü How forms map onto meanings

FCA analysis of Haspelmath’s (1997) data

(78)

Method: can we open the black-box?

FCA solves the problem of form/meaning mapping, since it shows:

ü How forms map onto meanings

ü Which concepts are lexicalized and which are not

FCA analysis of Haspelmath’s (1997) data

(79)

Method: can we open the black-box?

Figure 3. FCA analysis of

Haspelmath’s (1997) data

FCA solves the problem of form/meaning mapping, since it shows:

ü How forms map onto meanings

ü Which concepts are lexicalized and which are not

ü Implication sets can be computed automatically

(80)

Method: can we open the black-box?

Figure 3. FCA analysis of

Haspelmath’s (1997) data

FCA solves the problem of form/meaning mapping, since it shows:

ü How forms map onto meanings

ü Which concepts are lexicalized and which are not

ü Implication sets can be computed automatically

(81)

Method: can we open the black-box?

Figure 3. FCA analysis of

Haspelmath’s (1997) data

FCA solves the problem of form/meaning mapping, since it shows:

ü How forms map onto meanings

ü Which concepts are lexicalized and which are not

ü Implication sets can be computed automatically

(82)

Method: can we open the black-box?

Figure 3. FCA analysis of

Haspelmath’s (1997) data

FCA solves the problem of form/meaning mapping, since it shows:

ü How forms map onto meanings

ü Which concepts are lexicalized and which are not

ü Implication sets can be computed automatically

(83)

Method: can we open the black-box?

Figure 3. FCA analysis of

Haspelmath’s (1997) data

FCA solves the problem of form/meaning mapping, since it shows:

ü How forms map onto meanings

ü Which concepts are lexicalized and which are not

ü Implication sets can be computed automatically u But, less ‘reader-friendly’

(especially with many meanings = attributes)

(84)

Semantic maps

Figure 3. FCA analysis of

Haspelmath’s (1997) data

FCA solves the problem of form/meaning mapping, since it shows:

ü How forms map onto meanings

ü Which concepts are lexicalized and which are not

ü Implication sets can be computed automatically u But, less ‘reader-friendly’

(especially with many meanings = attributes)

(85)

Semantic maps

FCA solves the problem of form/meaning mapping, since it shows:

ü How forms map onto meanings

ü Which concepts are lexicalized and which are not

ü Implication sets can be computed automatically u But, less ‘reader-friendly’

(especially with many meanings = attributes)

u Complementarity between the two approaches

FCA analysis of time-related lexemes (588 objects = words; 221 attributes = meanings)

FCA analysis of Haspelmath’s (1997) data

(86)

Conclusions:

co-expression vs. semantic similarity

(87)

Conclusions

Ø Co-expression ⇎ semantic similarity (e.g., Malchukov 2010) o No relation o Homonyms o Symmetrical relations o Auto-antonyms (Klégr 2013) o Hierarchical relations o Auto-hyponyms o Auto-meronyms o Etc.

(88)

Conclusions

(89)

Conclusions

Thanks!

[email protected]

Figure

FIGURE 1. A semantic map of  typical dative functions /  the boundaries of  English to (based on Haspelmath 2003: 213)
FIGURE 1. A semantic map of  typical dative functions /  the boundaries of  English to (based on Haspelmath 2003: 213)
FIGURE 1. A semantic map of  typical dative functions /  the boundaries of  English to (based on Haspelmath 2003: 213)
FIGURE 2. MDS analysis of  Haspelmath’s (1997) data  on indefinite pronouns
+7

Références

Documents relatifs

Being given a text and several questions about it, competing systems had to choose the correct answer among five candidates to each question, showing in this way their level

In addition to the ex- tensive and consistent experimental comparison of six different statistical methods on MEDIA as well as on two newly col- lected corpora in different

• an analytical model to study the performance of all replication scenarios against silent errors, namely, duplication, triplication, or more for process and group replications;

The modified Seq2Seq model with emotion components, where X denotes the input, Affect(X) and GloVe(X) are the affective and semantic representations of X respectively, Y is

the standard PACO algorithm applied on the whole ADI stack (left), PACO applied to the subset of the ADI stack obtained by removing the eight frames with the largest

Therefore we target a “composer- centered” machine learning approach [11] allowing users of computer-assisted composition systems to implement experimental cycles

An important result of pictorial semiotics, which seems to have been discovered independently by the Greimas school and by Groupe µ, is the division of the

In the context of Angry Birds, a single case should enclose a problem description part, with a representation of a game scene, covering the slingshot and all objects and pigs, and