• Aucun résultat trouvé

Verbal Multiword Expression Identification:

N/A
N/A
Protected

Academic year: 2022

Partager "Verbal Multiword Expression Identification:"

Copied!
1
0
0

Texte intégral

(1)

Verbal Multiword Expression Identification:

Do We Need a Sledgehammer to Crack a Nut?

Caroline Pasquer 1 Agata Savary 1 Carlos Ramisch 2 Jean-Yves Antoine 1

1

University of Tours, LIFAT, France;

2

Aix Marseille University, LIS, France

Multiword expressions (MWEs)

I Definition: Word combinations with idiosyncratic properties (lexical, syntactic, semantic, etc.)

cut corners ‘do an incomplete job’

I MWE identification – challenges:

. Variability

Companies were cutting corners to save costs.

The field would look uneven if corners were cut.

. Literal and coincidental occurrences

Start with cutting one corner of the disinfectant bag.

If you cut along the lines, you get two acute corners.

I Our idea: simple method to identify verbal MWEs (VMWEs) previously seen in training data

PARSEME Shared Task

I Corpora:

. Unified typology for VMWEs in 19 languages:

I

verbal idioms (VID: cut corners ),

I

light-verb constructions (LVC.full: take a walk ),

I

inherently reflexive verbs (IRV: help oneself )

. VMWE + morpho-syntactic annotation

. train/dev/test split

. Seen VMWEs: 74–94% in dev; 50–92% in test

I Shared task on VMWE identification (edition 1.1):

. Systems are given VMWE-annotated train/dev,

. Given test with morphosyntax, predict VMWEs

. Best cross-language macro-average F1 < 0.6

MWEs’ Nature as a Guiding Principle

I MWEs are regular at the level of tokens , but idiosyncratic at the level of types

I Strong lexicalization : the combination of lexemes (not their senses) makes a MWE

. cut corners vs. trim corners , cut edges

I Critical difficulty of unseen data

I Room for improvement on seen MWEs

I Literal readings are extremely rare in practice

Hypotheses

H1 Search co-occurrences of precise lexemes (+POS) annotated as VMWEs in train

H2 Allow only for those morphosyntactic variants which were previously seen in train

H3 Not rely heavily on automatic POS tagging and syntactic parsing, which may be noisy H4 Consider syntactic coherence to eliminate coincidental occurrences

The Seen2020 system

1. Extract from dev (training) or test (testing) previously seen lemma multisets ( H1 ) 2. Filter by applying a combination of:

[f1] Components should be disambiguated by their POS ( H1 , H3 ) [f2] Components should appear in specific orders ( H2 )

[f3] Components and inserted POS should appear in specific orders ( H2 ) [f4] Components should not be too far ( H4 , H3 )

[f5] Closer components are preferred over distant ones ( H4 , H3 ) [f6] Components should be syntactically connected ( H4 )

[f7] Nominal components should appear with a seen inflection ( H2 ) [f8] Nested VMWEs should be annotated as in train

3. Select the optimal filter combination (among 2

8

possibilities) using dev Results

System Macro-ave. F1 seen seen

+unseen

SHOMA 0.81 0.64

TRAVERSAL 0.77 0.59 TRAPACC

S

0.73 0.57 VarIDE 0.61 0.49 Seen2020 0.83 0.67

Seen F1 on test of Seen 2020 vs. 4 shared task systems

Conclusions

I Seen2020’s focus on (variants of) seen data boosts global performances

I Seen2020 outperforms best (neural) systems on seen and seen+unseen data

I Seen2020 is very simple : 8 morphosyntactic on/off filters (vs. millions of real numbers)

I Seen2020’s results are interpretable :

. identify filters responsible for errors → incremental development and customization

. filters reflect known, pervasive and generic linguistic properties of VMWEs

I drawbacks : sensitivity to scarcity and quality of VMWE annotations

I perspectives : new languages, out-of-domain generalizability

Focus on filters: example corpus

1. La lumière sera faite sur ce drame.

‘The light will be done on this drama’ → ‘Light will be shed on this drama.’

2. La porte a résolument été fermée aux initiatives.

‘The door was firmly closed on initiatives.’

3. Il ferme la porte à une loi. ‘He closes the door on a law.’

4. Dorothée [...] prit la fuite ‘Dorothée took escape’ → ‘Dorothée absconded.’

5. Le fossé entre les riches et les pauvres se creusait.

‘The gap between the rich and the poor widened.’

train

6. L’enquête n’a pas fait de lumière sur les causes du sinistre.

‘The inquiry shed no light on the disaster’s causes.’

7. Sa lumière la fait briller. ‘Its light makes it shine.’

8. Celui-ci met en lumière le constat fait plus haut.

‘This one puts into light the observation made above.’

9. La lumière a enfin été faite sur ce drame.

‘The light was finally shed on this drama.’

10. Faire une partie de la lumière sur le projet de réforme.

‘To shed part of the light on this reform project.’

11. la lumière nocturne, qui n’a pas fait l’objet [...] que la lumière diurne.

‘nocturnal light, which has not been subject to [...] as diurnal light.’

12. Une enquête a été ouverte pour faire les lumières sur ce drame.

‘An inquiry was launched to shed the lights on this drama.’

13. La lumière sur son rôle dans cette affaire doit être faite.

‘The light on its role in this case must be shed.’

14. La lumière est faite sur ce drame. ‘The light is shed on this drama.’

15. La lumière est faite de diode LED. ‘The light is made of LED diodes.’

test

Focus on filters: selected candidates

Ex. Candidate f1f2f3f4f5f6f7bf8

7 lumière

noun

la

pron

fait

aux

X X X X X X

8 met en lumière

noun

le

det

constat

noun

fait

verb

X X X X X

9 La

det

lumière

noun

a

aux

enfin

adv

été

aux

faite

verb

X X X X X X X X

10 Faire

verb

une

det

partie

noun

de

adp

la

det

lumière

noun

X X X X X X

11-a La

det

lumière

noun

fait

verb

X X X X X X

11-b [...] lumière

noun

fait

verb

l’

det

[...] X X X X

11-c [...] fait

verb

[...] la

det

lumière

noun

[...] X X X X X

12 [...] faire

verb

les

det

lumières

noun.plur

[...] X X X X X X X

13 La

det

lumière

noun

sur

adp

son

det

rôle

noun

X X X X X X

dans

adp

cette

det

affaire

noun

doit

verb

être

aux

faite

verb

14 La

det

lumière

noun

est

aux

faite

verb

X X X X X X X X

15 La

det

lumière

noun

est

aux

faite

verb

X X X X X X X X

COLING 2020, Barcelona (online)

Références

Documents relatifs

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des

Next, we present the implementation of this theoretical model on a robotic head (fig. 1), with the constraints of on line learning, that will allow us to underline three

defined multiple security properties that protect against parameter subversion: subversion soundness (S-SND) means that no adversary can generate a malicious CRS together with a

At the same time this approach measures the rate of convergence at finite stages of the process using only data available at that particular stage of the process; in fact, instead

Randomized trial results of adjunctive folate treatment in major depressive disorder: A 52-week trial measuring the effect of folate augmentation in lithium clinic patients

One of the most important ways for primary health care providers working as interprofessional teams to provide the best care for patients is by sharing appropriately the

These privileges include the right to sell medications (pharma- cists currently have a monopoly that runs counter to all of the rules with respect to competition and conflicts of

  Canada  can  be  justifiably  proud  of  advances  made  during  the  past  30  years  in  care  of  patients  with  palliative  needs.  Good  access