• Aucun résultat trouvé

Machine Translation for Swiss German Dialects

N/A
N/A
Protected

Academic year: 2022

Partager "Machine Translation for Swiss German Dialects"

Copied!
2
0
0

Texte intégral

(1)

Poster

Reference

Machine Translation for Swiss German Dialects

SCHERRER, Yves

SCHERRER, Yves. Machine Translation for Swiss German Dialects. In: Ecole doctorale CUSO en Linguistique, Leysin, Mars 2008, 2008

Available at:

http://archive-ouverte.unige.ch/unige:22820

Disclaimer: layout of this document may differ from the published version.

1 / 1

(2)

Machine Translation for Swiss German Dialects

Yves Scherrer – LATL – Université de Genève

Background – NLP and dialectology

Traditionally, NLP (Natural Language Processing) and dialectology use quite different approaches to language. Mainly for practical reasons, NLP research usually focuses on written, standardized language. Dialectology is, by definition, concerned with spoken language and microvariation.

However, the contact points between these two research areas have been increasing recently:

NLP Dialectology

Multi- dialect Machine Translation Statistical models

(→ variation) Closely related

languages Single-dialect

studies

(Chinese, Arabic)

Cartography

Dialectometry (Classification)

Dialect syntax

Theoretical goal

Pursue the converging trends between NLP and dialectology by building a machine translation model that uses dialect maps.

Practical goal

Build a system that al- lows to translate a Standard German sentence into any Swiss German dialect cho- sen by the user.

Translating a sentence in two steps

1. Parse the given Standard German sentence to extract the syntactic structure and the word lemmas and to chunk com- pound nouns. We use the German Fips parser.

2. Apply syntactic, lexical and phonetic transformations on the sentence. The transformations are stored in probabilistic maps, according to their geographical distribution.

In transition zones, the model may output several weighted translations of the same sentence.

Swiss German dialect atlases

The modelling of Swiss German dialects is attractive because of their social importance and because of the easy access do dialec- tological studies. We draw our data from two dialect atlas projects:

SDS Contains 1548 hand-drawn maps. Mostly phonetic and lexi- cal phenomena, data collected between 1939 and 1958.

SADS Focuses on syntactic and morpho-syntactic phenomena.

Data collection recently completed, but not yet published.

Final -nd in Hund 'dog' (SDS II/120)

nd nt ŋ nn

Current work – Data selection

Our model will contain only a limited set of maps. The maps are selected according to the following criteria:

Linguistic diversity Phonetics, morphology, syntax, lexicon.

Generic phenomena Transformations applying to several words.

Frequent phenomena Main verbs rather than insect names.

Present-day phenomena No 1940’s agriculture terminology.

Current work – Data interpolation

The selected maps are digitized and interpolated for further use:

• The point maps (one symbol per inquiry point) are trans- formed to surface maps (continuous areas separated by isoglosses).

• Transition zones are transformed to probability distributions (represented by color intensity).

In parallel, we develop the algorithm that associates the maps with the parsed input sentence and performs the transformations.

Possible extension 1

Connect the model to a speech output to respect the medial diglossia of German-speaking Switzer- land (dialects are only used orally). This extension needs finer-grained pho- netic representations and prosodic parameters.

Possible extension 2

Use dialect sentences as in- put to perform translations from dialects to Standard German. Syntactic analy- sis will be performed in the output language (Standard German).

References

Claudia Bucheli and Elvira Glaser. The Syntactic Atlas of Swiss German Dialects [SADS]: empirical and methodological problems. In Sjef Bar- biers et al., editors, Syntactic Microvariation, volume II, pages 41 – 74.

Mertens Institute, Amsterdam, 2002.

Rudolf Hotzenköcherle. Die Sprachlandschaften der deutschen Schweiz.

Sauerländer, Aarau, 1984.

Rudolf Hotzenköcherle, Robert Schläpfer, Rudolf Trüb, and Paul Zinsli.

Sprachatlas der deutschen Schweiz (SDS). Francke, Bern, Basel, 1962 – 1997. 8 volumes.

John Nerbonne. Linguistic challenges for computationalists. In Proceed- ings of RANLP 2005, pages 5 – 13, Borovets, Bulgaria, 2005.

Eric Wehrli. Un modèle multilingue d’analyse syntaxique. In A. Auchlin et al., editors, Structures et discours, Mélanges offerts à Eddy Roulet, pages 311 – 329. Nota bene, Montréal, 2004.

1

Références

Documents relatifs

We explored the following features to weight the relevance of the bitexts and the individual sentences: corpus weights, alignment scores, recency of the data with respect to the

We then go on to substitute inconsistent TM target text nouns and verbs by consistent nouns and verbs from the SMT output, and to test (1) whether this results in improvements

The source side of the monolingual Turkish data used to create the synthetic corpus are translated to En- glish using two different TR→EN systems namely (E1) and (E2) where the

Multilingual publishing workflows Authoring with machine translation in mind. Authoring with machine translation

Each approach is deployed with a different NMT model and a different dataset variation; a dataset with original translation, a dataset with hybrid back-translated (synthetic) data,

In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1:

Cette thèse s’intéresse à la Traduction Automatique Neuronale Factorisé (FNMT) qui repose sur l'idée d'utiliser la morphologie et la décomposition grammaticale des mots (lemmes

Although in the direct translation path used as the back-off of the factored translation we are not inter- ested in the target-side lemma and tag, we still have to supply them for