• Aucun résultat trouvé

VarIDE at PARSEME Shared Task 2018

N/A
N/A
Protected

Academic year: 2021

Partager "VarIDE at PARSEME Shared Task 2018"

Copied!
2
0
0

Texte intégral

(1)

HAL Id: hal-01866366

https://hal.archives-ouvertes.fr/hal-01866366

Submitted on 3 Sep 2018

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

VarIDE at PARSEME Shared Task 2018

Caroline Pasquer, Agata Savary, Jean-Yves Antoine, Carlos Ramisch

To cite this version:

Caroline Pasquer, Agata Savary, Jean-Yves Antoine, Carlos Ramisch. VarIDE at PARSEME Shared Task 2018. COLING Joint Workshop on Linguistic Annotation, Multiword Expressions and Construc-tions (LAW-MWE-CxG-2018), Aug 2018, Santa Fe, United States. �hal-01866366�

(2)

VarIDE at PARSEME Shared Task 2018

Caroline Pasquer, Agata Savary, Jean-Yves Antoine

B first.last@univ-tours.fr

Carlos Ramisch, University of Aix-Marseille, France

first.last@lif.univ-mrs.fr

Verbal MWEs (VMWEs) Variant IDEntification based on previous work

[1]

(a) Rooms have

Pres.3rd.pl

a personalized look with curtains

(b) Please have

Imp

a personalized look today

(c) You always

advmod

have

2nd

a personalized look by using

advcl

this color

(d)

Applicants have

a personalized look at their resume.

(e)

She

has

Pres.3rd.sg

a personalized

look

(f)

I

have

not a personalized

look

(k)

Please

give a look at the book

T1: Rooms

have

Pres.3rd.pl

a personalized

look

with curtains.

T2: I

have

Pres.1st.sg

a personalized

look

.

Seen-in-train

Variants = Identical-to-train + Variant-of-train

(a)(b)(c)

(e)(f)

Variant? Non-variant?

VMWEs ≠ have look

(d)(k)

VarIDE uses the VMWE morphosyntactic

variability profile to identify variants:

(T1) vs. (T2)  allowed personal inflection

(e) = variant

VarIDE system for 19 languages

④ Candidate

filtering

NLTK naive Bayes classifier

Label prediction:

VMWEs

/

non-VMWEs

Same lemmas as a MWE in

TRAIN

Relevant POS (Bridge ≠ bridge)

Allowed POS sequence

Optional Filter20: ≤ 20 insertions

① VMWE extraction

Automatic language-dependent features

ABSolute and RELative (same

VMWE

) features

(1) ABS_morph_VERB_Tense=Past

(2) ABS_morph_VERB_Tense=Pres

(1) vs. (2)  REL_morph_VERB_Tense=false

Duplicated POS  syntactic dependencies used

(5) root vs. xcomp

ABS_morph_VERB_Mood = -1

ABS_morph_VERB_root_Mood = Imp

(6) obj. vs obj.

ABS_morph_VERB_Mood = -1

ABS_morph_VERB_obj_Mood = -1

③ Morphosyntactic Feature

extraction in

TRAIN

and

TEST

I

built bridges.

Construction workers all work very hard

to build safe bridges. We must

build bridges

across our

differences. They were building roads next to the bridge

# text = I built bridges.

1 I

I

PRON ... *

2

built

build VERB ...

1:VID

3

bridges

bridge NOUN ...

1

(1) The

bridges

I

built

between cultures.

(2) We

build

a

bridge

between countries.

(3) The bridges built in 2018.

(4) Tower Bridge was built in 1886.

(5)

Let

root

me

know

xcomp

if you come.

(6) You cannot

drink

obj

and drive

obj

IN

PU

T

O

U

TP

U

T

⑤ Candidate categorization

Code on GitLab

② Candidate extraction

Results

‚ Recall for Extraction

2

and Filtering

4

(

when R ą 0.8)

‚ Problems & remedies

Feature extraction

3

sensitive to train spurious/unbalanced data

Non-UD tagsets ñ adjustments when possible

Missing (TR), inconsistent (FR) lemmatization ñ handled for FR

Results

vs.

Filtering

4

ÒF1 in PT

ÓF1 in PL

ñ Tagset size matters ?

Conclusions and Perspectives

global F1 “ 0.46

2

1

F1 ă 0.2 (6 lang.)

2

F1(Discontinuity)“ 0.37

2

1

2

Detailed error analysis ñ Similar tendency per language family ?

Other classifiers: linear SVM, multilayer perceptron

Correlation ABSolute/RELative features vs. linguistic justifications ?

References,acknowledgements

[1] Pasquer et al., If you’ve seen some,

you’ve seen them all: Identifying variants

of multiword expressions, COLING 2018.

Work

funded

by

the

French

PARSEME-FR grant

Références

Documents relatifs

Based on this assumption, TRAVERSAL represents the task of MWE identification as a labeling task where one of two labels (MWE or not-MWE) must be predicted for each node in

This paper describes two systems for detecting Verbal Multiword Expressions (VMWEs) which both competed in the closed track at the PARSEME VMWE Shared Task 2018.. CRF-DepTree-

If deep linguistic processing is to be MWE-aware, lan- guage resources such as lexicons and treebanks contain- ing fine-grained description of MWEs are necessary.. While

The Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE- CxG-2018) 1 took place on August 25-26, 2018 in Santa Fe (USA), in conjunction with

This contribution describes the results of the second edition of the shared task on automatic identification of verbal multiword expressions, organized as part of the LAW-MWE-CxG

Its main outcome is a 5- million-word corpus annotated for VMWEs in all these languages, which underlies a shared task on automatic identification of VMWEs.. 2 Participants of

Therefore, the automated processing of MWEs is desirable for any natural language application that involves some degree of semantic interpretation, e.g., Machine

The main differences are that Medico (i) provides only multimedia data (videos and images) and no medical imaging data (CT scans, etc.), (ii) asks for using as little training data