• Aucun résultat trouvé

Using Hand-Written Rewrite Rules to Induce Underlying Morphology

N/A
N/A
Protected

Academic year: 2022

Partager "Using Hand-Written Rewrite Rules to Induce Underlying Morphology"

Copied!
1
0
0

Texte intégral

(1)

Using Hand-Written Rewrite Rules to Induce Underlying Morphology

Michael A. Tepper University of Washington [email protected]

Abstract

Allomorphic variation, or form-variation among morphemes with the same referential meaning, is often mentioned as a stumbling block to unsupervised morphological in- duction. To address this problem head-on, we present a hybrid approach that uses a small amount of linguistic knowledge in the form of orthographic rewrite rules to help refine an existing segmentation. Our goal is to learn when surface morphs (units of the segmentation) should really be counted together as the same underlying mor- pheme. In order to do this, we customize the Morfessor algorithm and model developed by Mathias Creutz and Krista Lagus, adding segmentation analyses generated by or- thographic rewrite rules along with a statistical framework to predict when analyses should be used as underlying morphemes. An initial segmentation produced by Morfes- sor Categories-MAP 0.9.2 is used as input. To suggest underlying morphemes, a set of language-specific orthographic rules is currently needed. Though we are not officially a part of the Challenge competition, for English and Turkish we report 62.22% and 54.83% contest F-measures, which amount to 2% and 48% improvements respectively over top unsupervised entrants for those languages.

Keywords

Morphological induction, Allomorphic variation, Knowledge-lite, Word segmentation

Références

Documents relatifs

Our conception of navigation (cf. Fig.1) relies on four elements: i) a text representation allowing lin- guistic specific phenomena and annotations (see Couto, 2006); ii)

This definition obeys the following principles: (i) flattening is a homomorphism, hence commutes with substitutions; (ii) because types may depend on objects, the type

We have presented a version of the λΠ-calculus modulo where rewrite rules are explicitly added in contexts making their addition an iterative process where previous rules can be used

1 They propose three properties for mappings of rights: core compatibility which indicates that the core of the cooperative game associated with any social cost problem is

In Section 4, we de- scribe our approach, named Videam platform (VIsual DrivEn dAta Miner), and we explain how the visualization can drive the different steps of the data mining

We have shown that in the Biochemical Abstract Machine BIOCHAM, the rule-based language for modeling bio- molecular interactions, and the temporal logics used for formalizing

When it comes to the vendor's knowledge, according to the traditional formula, the vendor‟s own belief is supposed be a part of their knowledge instead of that of

A Bayesian approach combining surface clues and linguistic knowledge: Application to the anaphora resolution problem.. Recent Advances in Natural Language Processing, Sep