Proceedings Chapter
Reference
Using Artificial Data to Compare the Difficulty of Using Statistical Machine Translation in Different Language-Pairs
RAYNER, Emmanuel, et al.
RAYNER, Emmanuel, et al . Using Artificial Data to Compare the Difficulty of Using Statistical Machine Translation in Different Language-Pairs. In: Proceedings of Machine Translation Summit XII . 2009.
Available at:
http://archive-ouverte.unige.ch/unige:5698
Disclaimer: layout of this document may differ from the published version.
1 / 1
StatistialMahineTranslation inDifferentLanguage-Pairs
MannyRayner,PaulaEstrella,PierretteBouillon,SoniaHalimi
UniversityofGeneva,TIM/ISSCO
40bvdduPont-d'Arve,CH-1211Geneva4,Switzerland
fEmmanuel.Rayner,Pierrette.Bouillon,Sonia.Halimigunige.h
pestrellagmail.om
YukieNakao
LINA,NantesUniversity,2,ruedelaHoussiniere,BP9220844322NantesCedex03
yukie.nakaouniv-nantes.fr
Abstrat
Anedotally, Statistial Mahine Translation
works muh better in some language pairs
than others, but methodologial problems
mean that it is difult to draw hard on-
lusions. In partiular, it is generally un-
lear whether translations in parallel train-
ingorporahavebeenproduedusingequiv-
alent onventions. In this paper, we report
on an experiment where a small-voabulary
multilingualinterlingua-basedtranslationsys-
temwas usedto generatedatato trainSMT
models for the 12 pairs involving the lan-
guagesfEnglish,Frenh,Japanese, Arabig.
By onstrution, the data an be assumed
strongly uniform. As expeted, translation
between English and Frenh in both dire-
tionsperformedmuh betterthantranslation
involvingJapanese. Less obviously, transla-
tionfrom EnglishandFrenhtoArabiper-
formedapproximately as well as translation
betweenEnglishandFrenh,andtranslation
toJapaneseperformedbetterthantranslation
fromJapanese.
1 Introdution
Sineits introdution intheearly 90s,when itwas
regarded asa dubious outsider, Statistial Mahine
Translation(SMT)hasrapidlygainedgrounduntilit
isnowonsidered themainstreamapproah. There
is,however,generalagreementthatsomelanguage-
pairsworkmuhbetterthan others. Inthepositive
diretion, the early suesses reported by theIBM
group (Brown et al., 1990) used Frenh-English,
whihisnowknowntobeanunusuallysuitablepair
Anedotally,translatingbetweentwoEuropeanlan-
guages is easier than translating between a Euro-
peanand a non-European language, and some lan-
guages,likeJapanese,arewidelyassumedtobedif-
ult. Given the steadily inreasing importane of
MTtehnology, it is often important to be able to
makeareasonableguessathowwellSMTwillwork
foranewlanguage-pair. BothSMTand RBMTre-
quire alarge investment ofeffortbefore anyevalu-
ablesystememerges;whenplanningaprojet,both
arhiteturesareinpriniplepossible,anditisdesir-
abletobeabletomakeaninformedhoiebetween
thematanearlystage.
Areentlarge-salestudy(Birhetal.,2008),us-
ingthe110 language-pairs overedbytheEuroparl
orpus, found that the features most preditive of
SMT translation quality were target language vo-
abulary size, lexiostatial relatedness (measured
intermsofproportion of ognate words), and sim-
ilarity in word order. The same study, however,
alsohighlightedthemethodologialproblemsinher-
entinarrying outthis typeofomparison. As al-
readynoted,targetvoabulary sizeturned outtobe
themost preditive feature. Voabulary size, how-
ever,dependsruiallyonhowmorphologyistaken
intoaount. Forexample,(Birhetal.,2008)on-
sideredthat Swedishhad a muhlargervoabulary
size than English, but this is almost entirely due
tothefatthat Swedish, like German, writesom-
pound nominals without intervening spaes. The
struture of these nominals, however, isoften very
similartothatoftheorrespondingEnglishphrases.
Theproblembeomesmoreauteinalanguagelike
Japanese, whih is normally written with no word
lel human-translated orpora, where it isgenerally
difult toknow whether thedataistrulyuniform.
Qualityandstyleoftranslationanvarywidely,with
translatorsusing differentguidelines. Inpartiular,
sometranslatorswillpreferamoreliteralstylethan
others. It is also ommon to mix in low-quality
data; a frequent hoie is translations taken from
thereverselanguage-pair,withthesoureandtarget
swapped around. Some reent studies (Ozdowska,
2009) have infat suggested that this kindof low-
quality adulteration an do more harm than good.
Conversely,otherpratitionersofSMThavepointed
to the performane gains that an be ahieved by
arefulleaningofthedata.
Withoutontrollingforallthesefators,itishard
toknowhowgeneraltheresultsofomparativestud-
iesreallyare.Although(Birhetal.,2008)isanun-
usually responsible and areful piee of work, the
authors point out that removal of the outlier lan-
guage (Finnish) substantially hanges the overall
onlusions; it is probably not a oinidene that
Finnish was also the only non-Indo-European lan-
guageusedinthestudy.
Inthispaper,wepresenttheresultsofanoveltype
of omparative study arried out using MedSLT,
a small-voabulary interlingua-based multilingual
speehtranslationsystemforamedialdomain.We
generatedparallelorporaforall12pairsinvolving
the soure languages fEnglish, Frenh, Japanese,
Arabig,rstusingthesourelanguagegrammarsto
generatearbitraryamountsofsoure-language data,
then,foreahtargetlanguage,passingitthroughthe
relevanttranslationrulestogeneratetargetlanguage
expressions.Useofinterlingua-basedtranslationen-
fores a uniform translation style. The small do-
main,whihweompletelyontrol,madeitpossible
toenforeuniformdeisionsabouthowmorphology
istreated.Forexample,wedeidedinArabitosplit
offthedeniteartileal,normallyafxedtothefol-
lowing noun, and treat it as a separate word. For
similar reasons,wealso treated Japanesetense and
politeness afxesas separate words. Thusa word
like okorimashita (happened) issplit up asokori
mashita (happen PAST-POLITE). One we had
reated theparallelorpora, wetrained SMTmod-
els,andevaluatedthequalityofthetranslationsthey
ter than translation involving Japanese. We were,
however,interestedtodisoverthattranslationfrom
EnglishandFrenh toArabiperformedaswellas
translation between English and Frenh, and that
translation toJapaneseperformedbetter thantrans-
lationfromJapanese.
Therestofthepaperisorganisedasfollows. Se-
tion 2 provides bakground on the MedSLT sys-
tem. Setion 3 desribes the experimental frame-
work,and Setion4theresultsobtained. Setion5
onludes.
2 TheMedSLTSystem
MedSLT (Bouillon et al., 2008) is a medium-
voabulary interlingua-based Open Soure speeh
translation system for dotor-patient medial ex-
aminationquestions, whihprovides any-language-
to-any-language translation apabilities for all lan-
guages intheset fEnglish, Frenh, Japanese, Ara-
bi, Catalang. Both speeh reognition and trans-
lation are rule-based. Speeh reognition runs on
theNuane8.5reognitionplatform,withgrammar-
basedlanguage modelsbuiltusingtheOpenSoure
Regulus ompiler. As desribed in (Rayner et
al., 2006), eah domain-spei language model
is extrated from a general resoure grammar us-
ing orpus-based methods driven by a seed or-
pusof domain-speiexamples. Theseedorpus,
whihtypiallyontainsbetween 500 and1500 ut-
teranes, isthen used a seond timeto add proba-
bilisti weightstothe grammarrules; this substan-
tiallyimprovesreognitionperformane (Rayneret
al.,2006,x11.5). Performanemeasures forspeeh
reognition in the three languages where serious
evaluations havebeen arriedout areshown inTa-
ble1.
At run-time, the reogniser produes a soure-
languagesemantirepresentation. Thisisrsttrans-
latedby one set of rulesinto an interlingual form,
andthenbyaseondsetinto atargetlanguagerep-
resentation. A target-language Regulus grammar,
ompiled into generation form, turns this into one
or more possible surfae strings, after whih a set
of generation preferenes piks one out. Finally,
theseleted string isrealised inspoken form. Ro-
English 6% 11%
Frenh 8% 10%
Japanese 3% 4%
Table 1: Reognition performane forEnglish, Frenh
andJapaneseMedSLTreognisers.WER=WordError
Rateforsourelanguagereogniser,onin-overagema-
terial;SemER=semantierrorrate(proportionofut-
teranesfailingtoprodueorretinterlingua)forsoure
languagereogniser,onin-overagematerial.
statistial reogniser, whihdrivesa robustembed-
ded help system. The purpose ofthe help system
(Chatzihrisasetal.,2006)istoguidetheuserto-
wardssupportedoverage;itperformsapproximate
mathing of output from the statistial reogniser
againalibraryofsenteneswhihhavebeenmarked
asorretly proessed during system development,
andthenpresentsthelosestmathestotheuser.
Examples of typial English domain sentenes
and their translations into Frenh, Arabi and
JapaneseareshowninFigure2.
3 Experimentalframework
In the literature on language modelling, there is a
knowntehniqueforbootstrapping astatistial lan-
guage model (SLM) from a grammar-based lan-
guage model (GLM). The grammar whih forms
the basis of the GLM is sampled randomly in or-
der to reate an arbitrarily large orpus of exam-
ples;theseexamplesarethenusedasatrainingor-
pustobuildtheSLM(Jurafskyetal.,1995;Jonson,
2005). We adapt this proess in a straightforward
waytoonstrutanSMTmodelforagivenlanguage
pair,usingthesourelanguagegrammar,thesoure-
to-interlingua translation rules, the interlingua-to-
target-languagerules,andthetargetlanguagegener-
ationgrammar. Westartinthesameway,usingthe
soure languagegrammartobuilda randomlygen-
eratedsourelanguageorpus;asshownin(Hokey
et al., 2008), it is important to have a probabilis-
ti grammar. We then use the omposition of the
otheromponentstoattempttotranslateeahsoure
languagesenteneintoatargetlanguageequivalent,
disardingtheexamples forwhihnotranslation is
produed. Theresultisanalignedbilingual orpus
model.
Weusedthismethodtogeneratealignedorpora
for12MedSLTlanguage pairswithsoureand tar-
getlanguages taken fromthesetfEnglish, Frenh,
Japanese, Arabig. For eah language pair, we
rst generated one million soure-language utter-
anes;wenextlteredthemtokeeponlyexamples
whih werefull sentenes, as opposed to elliptial
phrases, and used the translation rules and target-
languagegeneratorstoattempttotranslateeahsen-
tene.Thisreatedbetween260Kand310Kaligned
sentene-pairs for eah language-pair. In order to
make overage uniform for eah soure language,
wekeptonlythepairsforwhihthesouresentene
hadtranslationsinalltargetlanguages. Thismakes
itpossibletoomparefairlybetweenlanguage-pairs
with the same soure-language. In ontrast, it ap-
pears to us that it is less straightforward to om-
pare aross language-pairs with different soure-
languages,sinethereisnoobviouswaytoasertain
thatthetwosoure-language orporaareofompa-
rabledifulty.
The sizesofthenalsourelanguageorpus for
eahof the three soure languages isshown inTa-
ble3. Werandomly heldout2.5%ofeahofthese
setsasdevelopmentdata,and2.5%astestdata.Us-
ingGiza++,MosesandSRILM(OhandNey,2000;
Koehnetal.,2007;Stolke,2002),wetrainedSMT
modelsfrominreasingly largesubsetsofthetrain-
ing portion, using the development portion in the
usualwaytooptimizeparametervalues.Finally,we
used the resulting models to translate the test por-
tion.Weperformedthetestswithsuborporaofdif-
ferentsizesinordertosatisfy ourselvesthatperfor-
manehadtoppedout,andthatgenerationoffurther
training datawouldnotimprove performane. Full
detailsarepresentedin(Rayneretal.,2009).
Language #Sentenes #Words Voab.
Eng 236340 1441263 364
Fre 179758 1205308 557
Ara 233141 1509594 253
Jap 207717 1169106 336
Table 3: Statistis for nal auto-generated soure lan-
guage orpora for soure languages: number of sen-
tenes,numberofwords,andsizeofvoabulary
Frenh Avez-vousmaldepuisplusd'unmois?
Arabi Haltahusbialalammoundhouaktharminhahrwahid?
Japanese Ikkagetsuijouitamiwatsuzukimashitaka?
English Whendotheheadahesusuallyappear?
Frenh Quandavez-voushabituellement vosmauxdetete?
Arabi Mataatahusbialsoudaaadatan?
Japanese Daitaiitsuatamawaitamimasuka?
English Isthepainassoiatedwithnausea?
Frenh Avez-vousdesnaus´eesquandvousavezladouleur?
Arabi Haltouridantataqayaindamatahusbialalam
Japanese Itamutohakikewaokorimasuka?
English Doesbrightlightmaketheheadaheworse?
Frenh Vosmauxdetetesontilsaggrav´esparlalumiere?
Arabi Halyahtaddoualsoudaaaldhaw?
Japanese Akaruihikariwomirutozutsuwahidokunarimasuka?
Table2:ExamplesofEnglishdomainsentenes,withsystemtranslationsintoFrenh,ArabiandJapanese.
Our metris measurethe extent towhihthede-
rivedversionsoftheSMTwereabletoapproximate
the original RBMT on data whih was within the
RBMT'soverage.Themoststraightforwardwayto
dothisissimplytoountthesentenesinthetestset
whihreeivedifferenttranslationsfromtheRBMT
and theSMT.Avariantistodenea non-standard
versionoftheBLEUmetri(Papinenietal.,2001),
withtheRBMT'stranslationtakenasthereferene.
Thismeansthatperfetorrespondenebetweenthe
two translationswouldyield a non-standard BLEU
soreof1.0.
For all these metris, it is important to bring in
humanjudgesatsomepoint,usingthemtoevaluate
the ases wherethe SMTand RBMTdiffer. If,in
theseases,ittranspiredthathumanjudgestypially
thought that the SMT was as good as the RBMT,
then the metris would not be useful. We needto
satisfyourselvesthathumanjudgestypiallyasribe
differenesbetween SMTandRBMTtoshortom-
ingsintheSMTratherthanintheRBMT.
Conretely,weolletedallthedifferenthSoure,
SMT-translation, RBMT-translationi triples pro-
duedduringtheourseoftheexperiments,andex-
tratedthosetripleswherethetwotranslationswere
different. Werandomly seletedtriplesforseleted
languagepairs,and asked humanjudges tolassify
RBMT better: The RBMT translation was
better, in terms of preserving meaning and/or
beinggrammatiallyorret;
SMTbetter: TheSMTtranslation wasbetter,
in terms of preserving meaning and/or being
grammatiallyorret;
Similar: Bothtranslations wereaboutequally
goodORthesouresentenewasmeaningless
inthedomain.
In order to show that our metris are intuitively
meaningful, it is sufient to demonstrate that the
frequeny of ourrene of RBMT better is both
largeinomparisontothatof SMTbetter,anda-
ountsforasubstantialproportion ofthetotalpopu-
lation.
Inthe nextsetion, we presentthe results ofthe
variousexperimentswehavejustdesribed.
4 Results
Tables4,5and6presentthemainresults,summaris-
ingtheextenttowhihSMTandRBMTtranslations
differ for the 12 language-pairs. Sine the train-
ing and test data are independently sampled from
the soure grammar, and the domain is quite on-
strained, they overlap. This is natural, sine, in
training senteneswillalsoourintestdata;basi
questions like Whereis thepain? will be gener-
atedwithhighfrequenybytheprobabilistisoure
languagemodel,and will tendtoour inanysub-
stantial independently generated set, hene bothin
testandtraining. Whenountingdivergenttransla-
tionsin RBMTand SMToutput, we nonethe less
presentseparatelyresults fortestdatathatdoesnot
overlapwithtrainingdata(Table4)andfortestdata
thatdoesoverlapwithtrainingdata(Table5),onthe
groundsthat theguresare,asusual, verydifferent
forthetwokindsofmaterial. Thesetwotablesthus
summariseagreementbetween SMTand RBMTat
thesentene level. Table6shows thenon-standard
BLEU sores, where the RBMT translations have
been usedas the referene; these give a piture of
agreement between the two types of translation at
then-gramlevel.
Looking in partiular at Table 4, we see that
the gures fall into three distint groups. For
language-pairs involving only languages in the
groupfEnglish,Frenh, Arabig,SMTand RBMT
agree on about 70% to 80% ofthe sentenes. For
translationfromEnglishandFrenhtoJapanese,the
two typesoftranslation agree on about27%of the
sentenes. For translation from Japanese into the
otherthreelanguages,andforArabiintoJapanese,
we only get agreement on about 13% to 16% of
thesentenes. Thesedivisionsappear toshowlear
qualitativedifferenes.
Soure Target
Eng Fre Ara Jap
Eng xxx 69.6 76.5 27.9
Fre 77.1 xxx 72.4 26.9
Ara 76.7 79.1 xxx 13.9
Jap 15.7 14.7 12.7 xxx
Table4: PerentageoftranslationswhereSMTtransla-
tionoinideswithRBMTtranslation,overtestsentenes
notourringintrainingdata.
As disussed in the previous setion, simply
ountingdifferenesbetweenSMTandRBMTsays
nothing on its own; it is also neessary to estab-
lish what these differenes mean in terms of hu-
manjudgements. Weperformedevaluationsof this
Eng Fre Ara Jap
Eng xxx 87.9 92.4 77.8
Fre 94.7 xxx 94.4 74.4
Ara 95.2 90.8 xxx 64.0
Jap 79.1 81.4 76.6 xxx
Table5: PerentageoftranslationswhereSMTtransla-
tionoinideswithRBMTtranslation,overtestsentenes
ourringintrainingdata.
Soure Target
Eng Fre Ara Jap
Eng xxx 0.91 0.92 0.79
Fre 0.93 xxx 0.92 0.76
Ara 0.97 0.98 xxx 0.74
Jap 0.80 0.83 0.85 xxx
Table6:Non-standardBLEUsores(RBMTtranslations
usedasreferene),alldata.
found it easy toloate bilingual judges. First, Ta-
ble8showstheategorisation, aordingtotheri-
teria outlined at the end of Setion 3, for 500 En-
glish!Frenhpairsrandomlyseletedfromtheset
of exampleswhereRBMTand SMTgavedifferent
results; we asked three judges to evaluate themin-
dependently, andombined theirjudgmentsby ma-
jority deision where appropriate. We observed a
veryheavybiastowardstheRBMT,withunanimous
agreementthat theRBMTtranslation wasbetterin
201/500ases, and2-1agreementina further127.
Inontrast, there were only4/500 ases where the
judges unanimously thought that the SMT transla-
tionwaspreferable,with afurther 12supported by
amajority deision. The rest of thetablegives the
aseswheretheRBMTandSMTtranslationswere
judged the same or ases in whih the judges dis-
agreed;therewereonly41/500aseswherenoma-
jority deision was reahed. Our overall onlu-
sion isthat we arejustied in evaluating theSMT
by using the BLEU sores with the RBMTas the
referene. Oftheases wherethetwo systemsdif-
fer, only atiny fration, atmost16/500, indiate a
bettertranslation fromtheSMT,and welloverhalf
are translated better by the RBMT. Table 7 shows
some examplesofbad SMTtranslationsintheEn-
ialerrors(asuperuousextraverbintherst,and
agreementerrorsintheseond). Thethirdisanbad
hoieoftenseandpreposition;althoughgrammati-
al,thetargetlanguagesentenefailstopreservethe
meaning, and,rather thanreferring toa 20 daype-
riod ending now, instead refers to a 20 day period
sometimeinthepast.
Table 10 shows a similar evaluation for the En-
glish!Japanese. Here,thedifferenebetweenthe
SMTand RBMTversions wasso pronouned that
wefeltjustiedintakingasmallersample, ofonly
150sentenes. Thistime,92/150aseswereunani-
mouslyjudgedashavingabetterRBMTtranslation,
and there was not a single ase where even a ma-
jority found that the SMT was better. Agreement
wasgoodheretoo,withonly8/150asesnotyield-
ingatleastamajoritydeision. Unsurprisingly,the
mainproblemwith thislanguage-pair wasinability
tohandle orretlythedifferenes betweenEnglish
andJapaneseword-order. Table9againshowssome
typialexamples. Theerrorsaremuhmoreserious
than in Frenh, and the SMT translations are only
marginallyomprehensible.
Result Agreement Count
RBMTbetter alljudges 201
RBMTbetter majority 127
SMTbetter alljudges 4
SMTbetter majority 12
Similar alljudges 34
Similar majority 81
Unlear disagree 41
Total 500
Table8: ComparisonofRBMT andSMTperformane
on 500randomlyhosenEnglish !Frenhtranslation
examples,evaluatedindependentlybythreejudges.
Cursory examination oftheremaining language-
pairsstrongly suggested that thesame patterns ob-
tainedthereaswell,withveryfewaseswhereSMT
was better thanRBMT,and numerous ases inthe
opposite diretion. Sine other evaluations of the
MedSLT system (e.g. (Rayner et al., 2005)) show
that over98%ofin-overage translationsprodued
bytheRBMTsystemareaeptableintermsofpre-
SMTandRBMTanplausiblybeinterpreted asre-
etingerrorsproduedbytheSMT.
Result Agreement Count
RBMTbetter alljudges 92
RBMTbetter majority 32
SMTbetter alljudges 0
SMTbetter majority 0
Similar alljudges 2
Similar majority 16
Unlear disagree 8
Total 150
Table10: ComparisonofRBMTandSMTperformane
on150randomlyhosenEnglish!Japanesetranslation
examples,evaluatedindependentlybythreejudges.
5 SummaryandConlusions
Wehavepresented anexperimentinwhihwegen-
erateduniform artiial data for 12 languagepairs
inamultilingualsmall-voabularyinterlingua-based
translation system. Use ofthe interlingua enfored
a uniform translation standard, so we feel justied
inlaimingthat theresultsprovide harderevidene
thanusual about therelative suitability of different
language-pairsforSMT.
As expeted, translation between English and
Frenhinbothdiretionsismuhmorereliablethan
translationinlanguagepairsinvolvingJapanese. To
oursurprise,wealsofoundthattranslationbetween
Englishor FrenhandArabi workedaboutaswell
astranslation between Englishand Frenh, despite
the fat that Arabi typologially belongs toa dif-
ferentlanguagefamily.Informalonversationswith
olleagues who have worked on Arabi suggest,
however, that theresult isnotasunexpeted as we
rstimagined.
Table 4 appears to suggest that translation from
Japaneseworkssubstantiallyless wellthantransla-
tiontoJapanese. Theexplanation ismostprobably
the usual problem ofzero-anaphora, whih isvery
ommon in Japanese, with words that an learly
be inferred from ontext generally deleted. In the
from-Japanesediretion,itisneessarytogeneratea
translationofazeroanaphor(mostoftentheimpliit
RBMTFrenh vosmauxdetetesont-ilsaus´espardeshangementsdetemp´erature
(yourheadahesare-theyausedbyhangesoftemperature)
SMTFrenh avez-vousvosmauxdetetesont-ilsaus´espardeshangementsdetemp´erature
(have-youyourheadahesare-theyausedbyhangesoftemperature)
English areheadahesrelievedintheafternoon
RBMTFrenh vosmauxdetetediminuent-ilsl'apres-midi
(yourheadahes(MASC-PLUR)derease-MASC-PLURtheafternoon)
SMTFrenh vosmauxdetetediminue-t-ellel'apres-midi
(yourheadahes(MASC-PLUR)derease-FEM-SINGtheafternoon)
English haveyouhadthemfortwentydays
RBMTFrenh avez-vousvosmauxdetetedepuisvingtjours
(have-youyourheadahessinetwentydays)
SMTFrenh avez-vouseuvosmauxdetetependantvingtjours
(have-youhadyourheadahesduringtwentydays)
Table7:ExamplesofinorretSMTtranslationsfromEnglishintoFrenh.Errorsarehighlightedinbold.
diretion it is only a question of deleting mate-
rial. Although, as pointedout earlier inSetion 3,
we need to be areful when omparing between
language-pairs with different soure-language, the
gapinperformanehereislargeenoughthatwean
expetittoreetarealtrend.
Simple as theidea is, we hope that the method-
ology desribed inthis paper will make it possible
toevaluatetherelativesuitabilityofSMTfordiffer-
ent languagepairsina more quantitative way than
has so farbeen possible. Ingeneral, theonstru-
tionusedouldequallywellbeimplemented inthe
ontextofany otherhigh-performane multilingual
RBMT system. The idea of statistially relearn-
inganRBMTsystemhasreentlybeguntoaquire
some popularity (Seneff etal.,2006; Dugast etal.,
2008), and it should be easy tohek whether our
resultsaregenerallyreproduible.
Referenes
A.Birh, M. Osborne,and P.Koehn. 2008. Predit-
ingsuessinmahinetranslation. InProeedingsof
EMNLP2008,Waikiki,Hawaii.
P.Bouillon, G.Flores, M.Georgesul, S.Halimi, B.A.
Hokey,H.Isahara,K.Kanzaki,Y.Nakao,M.Rayner,
M. Santaholma, M. Starlander, and N. Tsourakis.
2008. Many-to-many multilingual medial speeh
translationonaPDA. InProeedingsofTheEighth
ConfereneoftheAssoiationforMahineTranslation
intheAmerias,Waikiki,Hawaii.
P.Brown,J.Coke,S.DellaPietra,V.DellaPietra,F.Je-
linek,J.Lafferty,R.Merer,andP.Roossin. 1990. A
statistialapproahtomahinetranslation. Computa-
tionalLinguistis,16(2):7985.
N. Chatzihrisas, P. Bouillon, M. Rayner, M. Santa-
holma,M.Starlander,andB.A.Hokey.2006.Evalu-
atingtaskperformaneforaunidiretionalontrolled
languagemedialspeehtranslationsystem. InPro-
eedingsoftheHLT-NAACL InternationalWorkshop
on Medial Speeh Translation, pages 916, New
York.
L.Dugast,J.Senellart,andP.Koehn. 2008. Canwere-
learnanRBMTsystem? InProeedingsoftheThird
Workshop on Statistial Mahine Translation, pages
175178,Columbus,Ohio.
B.A. Hokey, M. Rayner, and G. Christian. 2008.
Trainingstatistial language models from grammar-
generateddata: A omparativease-study. InPro-
eedingsofthe6thInternationalConfereneonNatu-
ralLanguageProessing,Gothenburg,Sweden.
R.Jonson. 2005.Generatingstatistiallanguagemodels
frominterpretationgrammarsindialoguesystems. In
RBMTJapanese banarimashitaka
eveningbePOLITE-PASTQ
SMTJapanese arimashitakaban
bePOLITE-PASTQevening
English doesnoisetypiallygiveyouyourheadahes
RBMTJapanese souongakininarutodaitaiatamawaitamimasuka
noiseSUBJexperieneWHENoftenheadTOPhurtPOLITEQ
SMTJapanese souongakininarutoitamimasukadaitaiatamawaitamimasuka
noiseSUBJexperieneWHENhurtPOLITEQoftenheadTOPhurtPOLITEQ
English isthepainusuallyausedbysuddenheadmovements
RBMTJapanese daitaitotsuzenatamawougokasutoitamimasuka
usuallyquiklyheadOBJmoveWHENhurtPOLITEQ
SMTJapanese daitaiitamiwatotsuzenatamawougokasuto[Missing: itamimasuka℄
usuallypainSUBJquiklyheadOBJmoveWHEN[Missing:hurtPOLITEQ℄
Table9:ExamplesofinorretSMTtranslationsfromEnglishintoJapanese.Errorsarehighlightedinbold.
A. Jurafsky,C. Wooters, J. Segal, A. Stolke, E. Fos-
ler, G. Tajhman, and N. Morgan. 1995. Using a
stohastiontext-freegrammaras alanguagemodel
forspeehreognition.InProeedingsoftheIEEEIn-
ternationalConfereneonAoustis,SpeehandSig-
nalProessing,pages189192.
P.KoehnandC.Monz. 2005. Sharedtask: Statistial
mahinetranslationbetweeneuropeanlanguages. In
ProeedingsoftheACLWorkshoponBuildingandUs-
ingParallelTexts,AnnArbor,MI.
P.Koehn andC. Monz. 2006. Manualandautomati
evaluation of mahine translation between european
languages. InProeedingson theWorkshop onSta-
tistialMahineTranslation,NewYorkCity,NY.
P. Koehn, H. Hoang, A. Birh, C. Callison-Burh,
M. Federio, N. Bertoldi, B. Cowan, W. Shen,
C.Moran,R.Zens,etal. 2007. Moses: Opensoure
toolkitforstatistialmahinetranslation.InANNUAL
MEETING-ASSOCIATION FOR COMPUTATIONAL
LINGUISTICS,volume45,page2.
F.J.OhandH.Ney. 2000. Improvedstatistialalign-
mentmodels. InProeedingsofthe38thAnnualMeet-
ingoftheAssoiationforComputationalLinguistis,
HongKong.
S. Ozdowska. 2009. Donnees´ bilingues pour la tas
fran¸ais-anglais: impatdelalanguesoureetdire-
tiondetradutionoriginalessurlaqualit´edelatradu-
tion. InProeedingsofTALN2009,Senlis,Frane.
K.Papineni,S.Roukos,T.Ward,andW.-J.Zhu. 2001.
BLEU: a method for automati evaluation of ma-
hine translation. Researh Report, Computer Si-
eneRC22176(W0109-022),IBMResearhDivision,
M.Rayner,P.Bouillon,N.Chatzihrisas,B.A.Hokey,
M.Santaholma,M.Starlander,H.Isahara,K.Kanzaki,
andY.Nakao. 2005. Amethodologyforomparing
grammar-basedandrobust approahestospeehun-
derstanding. InProeedingsofthe 9thInternational
ConfereneonSpokenLanguageProessing(ICSLP),
pages11031107,Lisboa,Portugal.
M.Rayner,B.A.Hokey,andP.Bouillon. 2006.Putting
Linguistis into Speeh Reognition: The Regulus
GrammarCompiler.CSLIPress,Chiago.
M. Rayner, P. Estrella, P. Bouillon, B.A. Hokey, and
Y.Nakao. 2009. Usingartiiallygenerateddatato
evaluatestatistial mahinetranslation. InProeed-
ingsoftheThirdWorkshoponGrammarEngineering
ArossFrameworks,Singapore.
S.Seneff,C.Wang,andJ.Lee. 2006. Combininglin-
guisti andstatistial methods forbi-diretionalEn-
glishChinesetranslationintheightdomain. InPro-
eedingsofAMTA2006.
A.Stolke. 2002.SRILM-anextensiblelanguagemod-
elingtoolkit. InSeventhInternationalConfereneon
SpokenLanguageProessing.ISCA.