Conclusions et perspectives - Université François Rabelais Tours

Ma dissertation se termine par les conclusions générales et les perspectives (chapitre 7). Ces dernières incluent:

• L’amélioration et l’extension des ressources et outils TAL existants, tels que Multiflex, Nerf et le corpus NKJP.

• L’intégration des ressources linguistiques fines dans les Linked Open Data, ainsi que le rapprochement du TAL, et notamment des acquis de la REN avec le web sémantique, dans le contexte de la désambiguïsation d’EN.

• Le parsing syntaxique des unités polylexicales, avec les défis définis dans le cadre de l’action COST PARSEME.

14http://www.cost.eu/, financé par European Science Foundation

15http://www.parseme.eu,http://www.cost.eu/domains_actions/ict/Actions/IC1207

• L’identification des UP dans des corpus arborés modélisée en tant que correction d’un arbre (un sous-arbre syntaxique extrait du corpus) par rapport à un langage d’arbres (l’ensemble de sous-arbres syntaxiques représentant une UP et ses variantes).

• Une taxonomie d’algorithmes de correction d’un arbre par rapport à un langage d’arbres, leur implantation et évaluation expérimentale dans un cadre commun.

Chapter 2

Composition and Variation – an Introduction

A large part of this thesis addresses some types of linguistic units which result from the com-position of linguistic items and whose inherent properties are those of linguistic (orthographic, morphological, syntactic and semantic) variability.

Composing (or combining) linguistic items yields larger linguistic items (usually containing several words) whose central property is to be or not to becompositional. Let us briefly refer to some works in the domain of the philosophy and mathematics of the language that address the compositionality principle. According to Pagin & Westerståhl (2001a), compositionality is a key notion in linguistics, philosophy of language, logic, and computer science, but there are divergent views about its exact formulation, methodological status, and empirical significance.

Many seminal contributions to this notion are attributed to Frege (Janssen, 2001), even if his idea of contextuality (a word has no meaning in isolation, but only in the context of a sentence) seems contradictory to his views on compositionality (we construct the sense of a sentence from the sense of its parts). As stressed by Kracht (2007), compositionality has not been thoroughly studied until the early 2000s. The generally admitted definition, after (Partee et al., 1990), is that a compound expression is compositional if its meaning is a function of the meanings of its parts and of the syntactic rule by which they are combined. Kracht points out that this definition is superficial in that (surface) expressions and their parts are usually ambiguous and that a meaning can only be assigned totheir analyses. Consequently, compositionality is primarily a property of a grammar, anda language is compositional if it has a compositional grammar. Kracht also mentions that, in the literature, one analysis is often considered superior to another one on the grounds that it is compositional. He argues though that proving compositionality is hard due to the lack of standards as to the boundary between the syntax and the semantics.

Baggio et al. (2012) remind and refine the following reasons for promoting composi-tionality in linguistic analyses: (i) productivity (there are infinitely many sentences in any natural language, but the brain has only finite storage capacity), (ii) systematicity (the ability to understand certain utterances is connected to the ability to understand certain others), (iii) methodology (compositionality underlies the method for semantic calculus), (iv) modularity (in-formation encapsulation at the level of the description of linguistic structure). They also argue that compositionality may imply a very large amount of rules dedicated to particular word com-binations, thus it is an issue of balance between storage and computation: compositionality can often be rescued by increasing the demand on (brain) storage, whereas it must be abandoned under realistic constraints on storage.

It appears, however, that compositionality of a natural language is far from evident (or proven). The arguments against compositionality, as summarized by Pagin & Westerståhl

Table 2.1: Sample emotion predicate classes in Emologus Class Example Valency modification function Conserving aider ’help’ ∀xV al(pred(x)) =V al(x)

Inverting casser ’break’ ∀xV al(pred(x)) =−V al(v)

Positive shift mignon ’cute’ ∀^xV al(pred(x)) =max(V al(x) + 1,2) Negative shift énérvé ’stressed’ ∀xV al(pred(x)) =min(V al(x)−1,−2) Minimum embrasser ’kiss’ ∀x,yV al(pred(x, y)) =min(V al(x), V al(y)) Multiplicative avoir ’have’ ∀^x,yV al(pred(x, y)) =V al(x)×V al(y) Positive caliner ’cuddle’ ∀x,yV al(pred(x, y)) = 2

Negative dégoûter ’disgust’ ∀x,yV al(pred(x, y)) =−2

. . . .

(2001b), include its vacuity, triviality, and superfluity, as well as – what is of major interest for this thesis – the fact that certain constructions arecounterexampleswhich make the com-positionality principle false. These problematic cases comprise belief sentences and quotations (both challenge the principle of substitutability of synonyms) as well as idioms. For instance, the meaning of the idiomto kick the bucket (i.e. to die) cannot be obtained by the same process as the one of interpreting the syntactically similar expression to fetch the bucket. The authors argue, however, that there are ways to incorporate idioms while preserving compositionality, in that different compositionality rules apply to idioms than to “regular” phrases.

In this thesis, I deal notably with Multi-Word Expressions (MWEs), which are larger classes than idioms but which are frequently defined under the premises of their non compositionality or atypical compositionality.

2.1 Compositionality of Emotion Expression

The hypothesis of linguistic compositionality, provided that it can be experimentally supported, is convenient for modeling and computation since it prevents a combinatorial explosion of lex-icalized cases. As an example, let us consider the problem of emotion expression in linguistic utterances and its automatic detection and characterization.

In (Tallec et al., 2009) and (Tallec et al., 2010b) we present the EmotiRob project aiming at a prototype of an emotional companion robot for weakened children. One of its projected features is facial expression of simulated emotions as a reaction to an interaction with a child.

Contrary to many other approaches in emotion detection, we assumed that polarity, also called valency (negative/positive/neuter) and intensity (moderate/strong) of an emotion conveyed by an utterance can be deduced from its propositional content, rather than from prosody only.

We validated this hypothesis within Emologus, a spoken language understanding system, which proceeds in three steps: (i) chunking, (ii) building semantic relations between chunks (roughly, dependency parsing), (iii) contextual interpretation. The vocabulary of this prototype system is restricted to about 1,000 words from a corpus of child-invented tales collected in a primary school.

We admitted that emotion calculation is compositional: (i) basic lexical items have an atomic emotional value, included in the interval[−2; 2], (ii) predicates can modify the emotional values of their arguments. Atomic emotional values were provided by psycholinguistic studies in children of ages 5 to 7. Emotion functions of predicates were determined by 5 adult annotators. Table 2.1 shows sample unary and binary predicate classes and their corresponding valency modification functions.

Given the atomic emotional values of lexical words and emotion predicate classes, the

cal-culation of the emotion associated to an utterance is performed compositionally. Consider the sentence in example (2.1). As a result of parsing in Emologus, the formula in example (2.2) is produced. Words cochon ’pig’ and ami ’friend’ have atomic emotional values 0 and 1, respec-tively. The unary predicate petit ’little’ belongs to the positive shift class, i.e. composed with its emotionally neutral argument, it yields the emotional value 1. The binary predicate avoir

’have’ yields a multiplication of the emotional values ofun petit cochon ’a small piglet’ andamis

’friends’, which results in value 1. Finally, the unary operator pas ’not’ inverts the value of its argument. As a bottom line, the emotional value of the whole sentence is -1.

(2.1) Il etait une fois un petit cochon qui n’avait pas d’amis.

Once upon a time, there was a little piglet who had no friends.

(2.2) (narrative (neg (to have [(subject: (pig [(size: little)])), (object: (friends))]))

In-domain evaluation (Tallec et al., 2010a) has shown that Emologus obtains a 90% accuracy in detecting the emotional value of an utterance. It significantly outperforms the baseline bag-of-words approach, which consists roughly in summing up the elementary emotional values of the words appearing in a given sentence, and which obtains a 68.8% accuracy on the same corpus. An error analysis shows that Emologus never assigns an emotional value whose valency is opposite to the expected one. Note, however, that the sub-language studied in EmotiRob is restricted to a domain with almost inexistent language resources, and with a relatively short vocabulary containing few compounds and multi-word expressions. A large-scale validation would be needed in order to study the influence of such non-compositional phenomena on the performances of the compositional emotion detection.

A validation of an approach similar to ours in the related domain of attitude (affect, judg-ment and appreciation) detection in adults is presented by Neviarouskaya et al. (2010). Here, the attitude detection operates on: (i) affect categories (anger, guilt, joy, etc.), (ii) polarity (positive, negative, neuter), (iii) intensity (between 0 and 1), and (iv) confidence level. A core lexicon of attitude-conveying terms (unfriendly, desire, etc.) is annotated with affect category, polarity and intensity. A closed list of modifiers and functional words (slightly, hardly, never, without, increase, etc.) is assigned attitude modification operators, similarly to predicates in our approach. Modal operators (arguably) are attributed the related confidence values. Verbs are classified with respect to their influence on attitude conveyed by a sentence (e.g. to defend belongs to the ’preservation’ class). Finally, compositional attitude calculus is based on rules of polarity reversal, aggregation, propagation, domination, neutralization, and intensification, at various grammatical levels (similar to our valency modification rules). These rules are applied to the output of dependency parsing. An evaluation on a 1000-sentence manually annotated cor-pus shows the overall top-level (when polarity only is accounted for) accuracy of 0.879. These results, comparable to Emologus performances, confirm that a compositional rule-based calcu-lus of emotion/attitude can yield relatively reliable results. Interestingly enough, some studies show that even semantically opaque linguistic units such as Multi-Word Expression (MWEs), to which the majority of this thesis is dedicated, show a relatively high degree of compositionality with respect to their emotional profile (Klebanov et al., 2013).

Dans le document Université François Rabelais Tours (Page 30-35)