Poster
Reference
Machine Translation for Swiss German Dialects
SCHERRER, Yves
SCHERRER, Yves. Machine Translation for Swiss German Dialects. In: Ecole doctorale CUSO en Linguistique , Leysin, Mars 2009, 2009
Available at:
http://archive-ouverte.unige.ch/unige:22815
Disclaimer: layout of this document may differ from the published version.
1 / 1
Machine Translation for Swiss German Dialects
Yves Scherrer LATL, Universit ´e de Gen `eve
Research questions
I How to adapt standard NLP methods to process dialect data?
I How to present dialectological research in a more interactive and dynamic way?
System architecture
StdG Text Target dialect coordinates StdG Parser
Fips (or other tools)
I Morphological features
I Lexical features
I POS tags
I Constituent structure
StdG Analysis XML document
Transfer
Georeferenced transfer rules
I Syntax
I Lexicon
I Phonetics
SwG Analysis Modified XML document
SwG Generation Build inflected forms from roots
SwG Text SwG Treebank
The project
Machine translation from Standard Ger- man (StdG)to any Swiss German (SwG) dialect.
I One standardized source language
I Relatively homogeneous data available
I Existing analysis tools and corpora can be used
I A continuum of non-standardized target varieties
I Diatopical variation as a parameter of the translation process
I Transition zones represented by gradually changing rule probabilities
A phonetic transfer rule:
Word-finalnd (SDS II/120) Phonetic transfer rules apply to etymo- logically related word pairsby modifying particular phonemes or graphemes.
Four SwG variants of StdG word-finalnd:
nd→nn nd→ng
nd→nt keepnd
This rule applies to different words:
Hund ‘dog’, Hand ‘hand’, gesund ‘healthy’.
Detection: Apply regular expressions on the StdG orthographic form.
Map creation
We use a Geographical Information Sys- tem (GIS) for mapping tasks.
Phonetics, morphology, lexicon:
1. Scan maps from the SDS atlas 2. Digitize scans to create point maps 3. Interpolate point maps to create
probabilistic raster maps Syntax:
1. Obtain preliminary (numerical) data from the SADS atlas project 2. Create point maps from these data 3. Interpolate point maps to create
probabilistic raster maps
A lexical transfer rule:
“Fruit tart” (SDS V/187)
Lexical transfer rules apply toetymolog- ically unrelated word pairs. They also deal with the resulting contextual changes (gender, argument structure, . . . ).
Four major SwG translations of StdG Kuchen(M) in the sense of ‘fruit tart’:
W ¨a ¨aje(+var.) (F) T ¨unne(+var.) (F)
Flade(M) Chueche(M)
Main features
I Traditional syntax-based transfer approach
I Uses existing StdG resources
I Allows creation of SwG treebanks
I A rule-based transfer module
I Dialect data would be insufficient for machine learning models
I Dialect atlas data can be incorporated
I Probabilistic georeferenced transfer rules
I Phonetic transformations partially replace lexicon entries
A syntactic transfer rule: (SADS I/9) Verb raising in subordinate clauses Syntactic transfer rules change thestruc- tureof the syntax tree.
StdG ob er einmal heiraten will if he once marry wants
VVINF VMFIN SwG1 ob er ¨amal h ¨urate wett
VVINF VMFIN SwG2 ob er ¨amal wett h ¨urate
VMFIN VVINF
SwG1area SwG2area (no modal raising) (modal raising) The rule:
I Combine the two VPs into one VP with a double head.
I If the target dialect is inside the SwG2area, swap adjacent VVINF+VMFIN.
Ecole doctorale CUSO Leysin 2009´