• Aucun résultat trouvé

Developing a French FrameNet: Methodology and First results

N/A
N/A
Protected

Academic year: 2022

Partager "Developing a French FrameNet: Methodology and First results"

Copied!
1
0
0

Texte intégral

(1)

Developing a French FrameNet: Methodology and First results

Marie Candito

?

, Pascal Amsili

, Lucie Barque

‡?

, Farah Benamara

, Gaël de Chalendar

]

, Marianne Djemaa

?

, Pauline Haas

‡5

, Richard Huyghe

, Yvette Yannick Mathieu

, Philippe Muller

, Benoît Sagot

?

, Laure Vieu

4

? Alpage (Univ. Paris Diderot / INRIA), • LLF (Univ. Paris Diderot / CNRS), ‡ Univ. Paris 13, IRIT, Toulouse University , 4 IRIT, CNRS, ] CEA LIST, 5 Lattice (Univ. Paris 3, CNRS, ENS), ◦ CLILLAC-ARP (Univ. Paris Diderot)

[email protected]

Background and Motivation

ASFALDA project (ANR-CORD-023)

a 3-year ANR project (started in sept. 2012)

−→ build FrameNet-annotated resources for French

−→ build FrameNet-analyzer

Motivations

Why FrameNet ?

−→ semantic-oriented generalizations, initially on English

−→ proved to be portable across languages (Boas et al., 05)

Why French ?

−→ French lacks large-scale semantically annotated resources

Annotation strategy

Focus on a set of domains

We build on the English FrameNet

Frame definition, frame/lemma associations and corpus annotations are very time-consuming tasks

Same coverage not reachable −→ we focus on a set of domains

−→ work domain-by-domain

(although domains do overlap)

−→ enforce coherence and exhaustivity within a domain

Notional domains

Linguistic communication

e.g. dire (say), prétendre (pretend) ...

pervasive in our corpora

representation challenge : quotation with non-communicative verbs (Sagot and Danlos, 2010)

Example: “It would be wrong to neglect ... India”, the prefect corrected.

potential information extraction : who said what to whom when?

Cognitive positions

e.g. penser (think), savoir (know), s’apercevoir (realize) ...

varying degree of certainty on a belief

potentially useful for fact extraction

Judgment/Evaluation

e.g. critique (critique), apprécier (appreciate), beau (beautiful) ...

potential application : opinion mining

“Cross-Cutting Domains”

Temporal relations / Spatial relations / Causality

pervasive semantic phenomena

at the interface between lexical semantics and discourse semantics

evoked by verbs, deverbals, but also prepositions, conjunctions, adverbs

captured by frames and frame-to-frame relations

Current status

Frames and lexicon for our target domains are now fully defined.

Work on frames and lexicon was performed jointly and iteratively:

starting from English frames

using a preliminary French framenet lexicon automatically projected from English (Padò, 07; Mouton et al. 2010)

Frames

Understanding the exact semantic scope of a frame often difficult

−→ we decided to:

create expert teams for each domain

systematically make explicit the distinctive characteristics of frames

frame merging if the distinction seemed too subtle, or not justified for French

Notional domain Nb. of Total nb. Merges Splits Modifications Other

frames modified frames of roles only modifications

Commercial transaction 11 0 0 0 0 0

Verbal communication 34 3 2 1 0 1

Judgement/Evaluation 16 0 0 0 0 0

Cognitive positions 19 4 2 1 0 1

Temporal relations 4 2 1 0 0 1

Spatial relations 20 14 2 0 12 0

Causality 8 5 1 0 4 0

Total 106 28 8 2 16 3

(NB: domains overlap)

Table : Current number of (lexicalized) frames in French FrameNet substructure, for each domain, with number of modified frames with respect to the English FrameNet frames (in total, and broken down by difference types).

Lexicon

Initial scenario: double annotation by external annotators + adjudication

but the task proved to be very difficult!

Actual scenario:

double annotation by experts + adjudication

additional phase of polysemy checking

and at each stage, frequent back-and-forths between lexicon development and frame definitions / remodeling

Resulting lexicon: 1638 frame/lemma pairs, for 1359 distinct lemmas

(797 verbs, 574 nouns, 270 adjectives, 119 prepositions, 41 conjunctions)

Next Steps

Semantic annotations on treebanks

Annotation of frames and semantic roles on top of syntactic dependency trees

Using the Salto graphical interface (Burchardt et al., 2006)

Using two treebanks converted to dependencies:

The French Treebank (Abeillé and Barrier, 2004): 18k sentences, news

The Sequoia Treebank (Candito and Seddah, 2012): 3k sentences, medical, europarl, news

−→ usable to study syntax-semantic interface

temporal and location modifiers treated using time- and location-related frames instead of peripheral semantic roles

The General Secretary said during a press conference that he thought “ the situation will unlock” before the deadline of February 15

Figure : Example of target annotations on dependency trees, with the Salto tool

Références

Documents relatifs

insensible: adamantin, anesth´esi´e, apathique, aride, assoupi, blas´e, calleux, calme, cruel, de marbre, dess´ech´e, dur, d´etach´e, endormi, endurci, engourdi, flegmatique,

3132-20 ,les établissements de vente au détail situés dans les communesd'intérêt touristique ou thermales et dans les zones touristiques d'affluenceexceptionnelle ou

Frame- Bank belongs to FrameNet-oriented resources, but unlike Berkeley FrameNet it focuses more on morphosyntactic and semantic features of individual lexemes rather than on

In order to investigate possible connections be- tween frames and cyberbullying we annotated all the sentences of the dataset with frames and frame elements referring to the 1.7

Only (40%) of the teachers introduce their students to the translation of medical texts from English into Arabic.. No teacher seems to frequently expose his students to

We calculated type precision, type recall and F-measure against the gold standard, and obtained 0.79 precision, 0.55 recall and 0.65 F-measure. These results are shown in table 1,

This article introduces GL ` AFF, 1 a large versatile French lexicon extracted from Wiktionnaire, the French edition of Wiktionary.. Wiktionnaire contains more than 2 million

Regarding the used sentiment and emotional typology, we have chosen the one proposed by (Ekman, 1992) consisting of two polarities (positive and negative) and six basic