• Aucun résultat trouvé

Idiom Token Classification with Distributed Semantics

N/A
N/A
Protected

Academic year: 2022

Partager "Idiom Token Classification with Distributed Semantics"

Copied!
1
0
0

Texte intégral

(1)

Idiom Token Classification with Distributed Semantics

Giancarlo D. Salton, Robert J. Ross, and John D. Kelleher

Applied Intelligence Research Centre School of Computing Dublin Institute of Technology

Ireland

Summary

Idiom token classification is the task of deciding for a set of potentially idiomatic phrases whether each occurrence of a phrase is a literal or idiomatic usage of the phrase. Identifying such usages of phrases is important for Natural language Pro- cessing (NLP) systems. For example, in Statistical Machine Translation (SMT) it has been shown that translations of sentences containing idioms receive lower scores than translations of sentences that do not contain idioms in SMT [2].

In this paper [3], we present an approach to idiom token classification based on features of distributed representations. We explore the capability of Skip- Thought Vectors [1] to encode features regarding the meaning of a sentence and show they are predictive with respect to idiom token classification.

We demonstrate that classifiers using these representations have competitive performance compared with the state of the art in idiom token classification.

The current state of the art system uses specific features for each idiomatic phrase extracted from long discourse contexts. Importantly, however, our models use only the sentence containing the target phrase as input and are thus less dependent on a potentially inaccurate or incomplete model of discourse context.

We further demonstrate the feasibility of using these representations to train a competitive general idiom token classifier, i.e., a classifier that can take any idiomatic phrase as input. The ability to create a general idiom token classi- fier using these representations contrasts with previous work which focused on creating a separate models for each potentially idiomatic phrase.

References

1. Kiros, R., Zhu, Y., Salakhutdinov, R.R., Zemel, R., Urtasun, R., Torralba, A., Fidler, S.: Skip-thought vectors. In: Advances in Neural Information Processing Systems 28. pp. 3276–3284 (2015)

2. Salton, G.D., Ross, R.J., Kelleher, J.D.: An Empirical Study of the Impact of Idioms on Phrase Based Statistical Machine Translation of English to Brazilian-Portuguese.

In: Third Workshop on Hybrid Approaches to Translation (HyTra). pp. 36–41 (2014) 3. Salton, G.D., Ross, R.J., Kelleher, J.D.: Idiom token classification using sentential distributed semantics. Proceedings of the 54th Annual Meeting on Association for Computational Linguistics pp. 194–204 (2016)

Références

Documents relatifs

forms that differ on capitalization or inflection (conjugation or plural when the canonical form of verbs and nouns is searched). The “Sources” section allows users to refine

In this work, we have studied the problem of identifying translations of idiomatic expressions in both English and French, using a brand new version of the bilin- gual

2) Synchronization method and messages for Tree Merging Process: ‘Rule3’ in Figure 1 represents the spanning tree construction scenario (merging trees pro- cess). DA-GRS

However, despite several recommendations for the use of token reinforcement within physical education and the documented effectiveness of token systems applied in

The leading tone is accentuated in the third and the last phrase of the following song as the highest tone in the melodic course.. The high tonic at the end of the third phrase has no

It is therefore likely that more priming will be obtained for literal and idiomatic targets displayed at the offset than at the penultimate position of the idiom, since by the end

In terms of present-day linguistics, the inner form of a lexical unit (word or idiom) can be defined as a kind of semantic para- digmatic relations between the target lexeme and

In this section, we will give some properties of these Green functions that we will need later. The proof of the following lemma is based on elementary potential theory on the