Poster
Reference
Machine Translation for Swiss German Dialects
SCHERRER, Yves
SCHERRER, Yves. Machine Translation for Swiss German Dialects. In: Ecole doctorale CUSO en Linguistique, Leysin, Mars 2008, 2008
Available at:
http://archive-ouverte.unige.ch/unige:22820
Disclaimer: layout of this document may differ from the published version.
1 / 1
Machine Translation for Swiss German Dialects
Yves Scherrer – LATL – Université de Genève
Background – NLP and dialectology
Traditionally, NLP (Natural Language Processing) and dialectology use quite different approaches to language. Mainly for practical reasons, NLP research usually focuses on written, standardized language. Dialectology is, by definition, concerned with spoken language and microvariation.
However, the contact points between these two research areas have been increasing recently:
NLP Dialectology
Multi- dialect Machine Translation Statistical models
(→ variation) Closely related
languages Single-dialect
studies
(Chinese, Arabic)
Cartography
Dialectometry (Classification)
Dialect syntax
Theoretical goal
Pursue the converging trends between NLP and dialectology by building a machine translation model that uses dialect maps.
Practical goal
Build a system that al- lows to translate a Standard German sentence into any Swiss German dialect cho- sen by the user.
Translating a sentence in two steps
1. Parse the given Standard German sentence to extract the syntactic structure and the word lemmas and to chunk com- pound nouns. We use the German Fips parser.
2. Apply syntactic, lexical and phonetic transformations on the sentence. The transformations are stored in probabilistic maps, according to their geographical distribution.
In transition zones, the model may output several weighted translations of the same sentence.
Swiss German dialect atlases
The modelling of Swiss German dialects is attractive because of their social importance and because of the easy access do dialec- tological studies. We draw our data from two dialect atlas projects:
SDS Contains 1548 hand-drawn maps. Mostly phonetic and lexi- cal phenomena, data collected between 1939 and 1958.
SADS Focuses on syntactic and morpho-syntactic phenomena.
Data collection recently completed, but not yet published.
Final -nd in Hund 'dog' (SDS II/120)
nd nt ŋ nn
Current work – Data selection
Our model will contain only a limited set of maps. The maps are selected according to the following criteria:
Linguistic diversity Phonetics, morphology, syntax, lexicon.
Generic phenomena Transformations applying to several words.
Frequent phenomena Main verbs rather than insect names.
Present-day phenomena No 1940’s agriculture terminology.
Current work – Data interpolation
The selected maps are digitized and interpolated for further use:
• The point maps (one symbol per inquiry point) are trans- formed to surface maps (continuous areas separated by isoglosses).
• Transition zones are transformed to probability distributions (represented by color intensity).
In parallel, we develop the algorithm that associates the maps with the parsed input sentence and performs the transformations.
Possible extension 1
Connect the model to a speech output to respect the medial diglossia of German-speaking Switzer- land (dialects are only used orally). This extension needs finer-grained pho- netic representations and prosodic parameters.
Possible extension 2
Use dialect sentences as in- put to perform translations from dialects to Standard German. Syntactic analy- sis will be performed in the output language (Standard German).
References
Claudia Bucheli and Elvira Glaser. The Syntactic Atlas of Swiss German Dialects [SADS]: empirical and methodological problems. In Sjef Bar- biers et al., editors, Syntactic Microvariation, volume II, pages 41 – 74.
Mertens Institute, Amsterdam, 2002.
Rudolf Hotzenköcherle. Die Sprachlandschaften der deutschen Schweiz.
Sauerländer, Aarau, 1984.
Rudolf Hotzenköcherle, Robert Schläpfer, Rudolf Trüb, and Paul Zinsli.
Sprachatlas der deutschen Schweiz (SDS). Francke, Bern, Basel, 1962 – 1997. 8 volumes.
John Nerbonne. Linguistic challenges for computationalists. In Proceed- ings of RANLP 2005, pages 5 – 13, Borovets, Bulgaria, 2005.
Eric Wehrli. Un modèle multilingue d’analyse syntaxique. In A. Auchlin et al., editors, Structures et discours, Mélanges offerts à Eddy Roulet, pages 311 – 329. Nota bene, Montréal, 2004.
1