• Aucun résultat trouvé

Machine Translation for Swiss German Dialects

N/A
N/A
Protected

Academic year: 2022

Partager "Machine Translation for Swiss German Dialects"

Copied!
2
0
0

Texte intégral

(1)

Poster

Reference

Machine Translation for Swiss German Dialects

SCHERRER, Yves

SCHERRER, Yves. Machine Translation for Swiss German Dialects. In: Ecole doctorale CUSO en Linguistique , Leysin, Mars 2009, 2009

Available at:

http://archive-ouverte.unige.ch/unige:22815

Disclaimer: layout of this document may differ from the published version.

1 / 1

(2)

Machine Translation for Swiss German Dialects

Yves Scherrer LATL, Universit ´e de Gen `eve

Research questions

I How to adapt standard NLP methods to process dialect data?

I How to present dialectological research in a more interactive and dynamic way?

System architecture

StdG Text Target dialect coordinates StdG Parser

Fips (or other tools)

I Morphological features

I Lexical features

I POS tags

I Constituent structure

StdG Analysis XML document

Transfer

Georeferenced transfer rules

I Syntax

I Lexicon

I Phonetics

SwG Analysis Modified XML document

SwG Generation Build inflected forms from roots

SwG Text SwG Treebank

The project

Machine translation from Standard Ger- man (StdG)to any Swiss German (SwG) dialect.

I One standardized source language

I Relatively homogeneous data available

I Existing analysis tools and corpora can be used

I A continuum of non-standardized target varieties

I Diatopical variation as a parameter of the translation process

I Transition zones represented by gradually changing rule probabilities

A phonetic transfer rule:

Word-finalnd (SDS II/120) Phonetic transfer rules apply to etymo- logically related word pairsby modifying particular phonemes or graphemes.

Four SwG variants of StdG word-finalnd:

nd→nn nd→ng

nd→nt keepnd

This rule applies to different words:

Hund ‘dog’, Hand ‘hand’, gesund ‘healthy’.

Detection: Apply regular expressions on the StdG orthographic form.

Map creation

We use a Geographical Information Sys- tem (GIS) for mapping tasks.

Phonetics, morphology, lexicon:

1. Scan maps from the SDS atlas 2. Digitize scans to create point maps 3. Interpolate point maps to create

probabilistic raster maps Syntax:

1. Obtain preliminary (numerical) data from the SADS atlas project 2. Create point maps from these data 3. Interpolate point maps to create

probabilistic raster maps

A lexical transfer rule:

“Fruit tart” (SDS V/187)

Lexical transfer rules apply toetymolog- ically unrelated word pairs. They also deal with the resulting contextual changes (gender, argument structure, . . . ).

Four major SwG translations of StdG Kuchen(M) in the sense of ‘fruit tart’:

W ¨a ¨aje(+var.) (F) T ¨unne(+var.) (F)

Flade(M) Chueche(M)

Main features

I Traditional syntax-based transfer approach

I Uses existing StdG resources

I Allows creation of SwG treebanks

I A rule-based transfer module

I Dialect data would be insufficient for machine learning models

I Dialect atlas data can be incorporated

I Probabilistic georeferenced transfer rules

I Phonetic transformations partially replace lexicon entries

A syntactic transfer rule: (SADS I/9) Verb raising in subordinate clauses Syntactic transfer rules change thestruc- tureof the syntax tree.

StdG ob er einmal heiraten will if he once marry wants

VVINF VMFIN SwG1 ob er ¨amal h ¨urate wett

VVINF VMFIN SwG2 ob er ¨amal wett h ¨urate

VMFIN VVINF

SwG1area SwG2area (no modal raising) (modal raising) The rule:

I Combine the two VPs into one VP with a double head.

I If the target dialect is inside the SwG2area, swap adjacent VVINF+VMFIN.

Ecole doctorale CUSO Leysin 2009´

Références

Documents relatifs

We show that a character-level machine translation system trained on pairs of segments (not pairs of words) and including multiple language models is able to achieve up to 90.46%

Both phrase-based and syntax-based MT systems were used for evaluating the translation quality of the verb- particle split construction in the German test suite.. We eval- uated

First, we have delivered experimental studies that investigate difficulties in English-to-Dutch from-scratch translation and showed that syntactic properties of a source text

While all these aspects are important for successfully applying a syntactic lan- guage model, our primary contributions are a novel dependency language model which improves over

The research presented uses theoretical constructs of formal syntax as tools to capture performance data in agreement production. The phenomenon examined is interference in

To sum up, the results of the trained models based on unigrams (the Basic EM and the Class model) are situated between those of the static models and the rule-based model, while

Pursue the converging trends between NLP and dialectology by building a machine translation model that uses dialect maps..

The syntactic distance hypothesis predicts that more verb agreement errors will occur when the intermediate noun disagrees in number with the head (The computer with the programs of