Proceedings Chapter
Reference
Bootstrapping A Statistical Speech Translator From A Rule-Based One
RAYNER, Emmanuel, ESTRELLA, Paula Susana, BOUILLON, Pierrette
RAYNER, Emmanuel, ESTRELLA, Paula Susana, BOUILLON, Pierrette. Bootstrapping A Statistical Speech Translator From A Rule-Based One. In: Proceedings of the 2nd
International Workshop On Free/Open-Source Rule-Based Machine Translation . 2011.
Available at:
http://archive-ouverte.unige.ch/unige:14930
Disclaimer: layout of this document may differ from the published version.
1 / 1
MannyRayner
UniversityofGeneva,TIM/ISSCO,
40bvdduPont-d'Arve
CH-1211Geneve,Switzerland
Emmanuel.Raynerunige.h
PaulaEstrella
FaMAF,U.NaionaldeC´ordoba
5000-C´ordoba,Argentine
pestrellafamaf.un.edu.ar
PierretteBouillon
UniversityofGeneva,TIM/ISSCO,
40bvdduPont-d'Arve
CH-1211Geneve,Switzerland
Pierrette.Bouillonunige.h
Abstrat
We desribea seriesofexperiments in
whihwestartwithEnglish!Frenh
andEnglish!Japaneseversionsofan
Open Soure rule-based speeh trans-
lation system for a medial domain,
and bootstrap orresponding statistial
systems. Comparative evaluation re-
veals that the rule-based systems are
still signiantly betterthan the statis-
tial ones,despite thefat that onsid-
erable effort has been invested intun-
ingboththereognitionandtranslation
omponents;also,ahybridsystemonly
marginally improved reall atthe ost
ofa loss inpreision. The result sug-
gests that rule-basedarhitetures may
stillbepreferabletostatistialonesfor
safety-ritialspeehtranslationtasks.
IndexTerms:Speehtranslation,rule-basedpro-
essing, statistial proessing, bootstrapping, in-
terlingua,evaluation
1 Introdution
Thispaperdesribesaontinuationofaseriesof
experiments entered around MedSLT (Bouillon
et al., 2008a), an Open Soure medial speeh
translator designed for dotor-patient ommuni-
ation whih uses a rule-based arhiteture; the
purpose of the experiments has been to om-
pare this arhiteturewith moremainstream sta-
tistial ones. The original motivation for using
regardingthe tradeoff between preision and re-
all. Speially,medial speehtranslation isa
safety-ritial domain, where preision is muh
moreimportantthanreall.Itisalsoimportantto
notethatthisisadomainwheresubstantialquan-
titiesof training data areunavailable. The ques-
tionishowtousetheverylimitedamountsofdata
atourdisposaltobesteffet. Thisisbynomeans
anunommonsenarioinlimited-domain speeh
translation, and ould in fat be regarded asthe
normratherthantheexeption.
Itisintuitivelynotunreasonabletobelievethat
rule-based methods are better suited to the re-
quirements outlined above, but the well-known
methodologialproblems involvedinperforming
omparisons between rule-based and statistial
systems havemade ithardtoestablishthis point
unambiguously. Inanearlierstudy(Rayneretal.,
2005), we presented head-to-head omparisons
between MedSLTand analternative whihom-
binedstatistial reognition andan adho trans-
lation mehanism based on hand-oded surfae
patterns, showingthattherule-basedsystemper-
formedomfortablybetter. Itwas,however,lear
fromomments wereeived thatthe ommunity
viewedthese results septially. The basi riti-
ismwasthat the robust proessing omponents
were too muh of a straw-man: more powerful
reognitionortranslationengines mightoneiv-
ablyhavereversedtheresult.
In the new series of experiments, our basi
goal has been to start with the rule-based om-
ponents and the orpus data used to onstrut
them, and then usethe same resoures, together
weadaptedand improvedmethodsoriginally de-
sribed in (Jurafsky et al., 1995) to bootstrap a
statistial reogniserfromtheoriginalrule-based
one. More reently, in(Rayner etal., 2010)we
used similarmethods tobootstrap statistial ma-
hinetranslationmodels.
In this urrent paper, we ombine the results
of theprevious two sets of experiments tobuild
a fullybootstrappedstatistial speehtranslation
system, whih we then ompare with the origi-
nalrule-basedone,andalsowithahybridsystem
whih ombines rule-based and statistial pro-
essing. Interestingly, although (Rayner et al.,
2010)demonstratedthatabootstrappedstatistial
mahinetranslationsystemisabletoaddsubstan-
tialrobustnesstotheoriginalrule-basedonewhen
botharerunontextdata,thisrobustnessdoesnot
arryovertospeehtranslation.
The rest ofthe paper is organised as follows.
Setion 2 presents bakground on the MedSLT
system; Setion 3 summarises the earlier exper-
imentsonbootstrappedstatistialreognitionand
mahine translation;Setion4desribes thenew
experiments;andSetion5onludes.
2 Bakground:theMedSLTSystem
MedSLT (Bouillon et al., 2008a) is a medium-
voabulary interlingua-based Open Soure 1
speeh translation system for dotor-patient
medial examination questions, whih provides
any-language-to-any- lan gu ag e translation a-
pabilities for all languages in the set fEnglish,
Frenh, Japanese, Arabi, Catalang. In what
follows, however, we will only be onerned
with thepairsEnglish! Frenhand English!
Japanese, whih we take, respetively, asrepre-
sentativeofaloseanddistantlanguage-pair.
Both speeh reognition and translation are
rule-based. Speehreognitionrunsontheom-
merial Nuane 8.5 reognition platform, with
grammar-based language models built using the
OpenSoure 2
Regulusompiler. Asdesribedin
(Rayner et al., 2006), eah domain-spei lan-
guagemodelisextratedfromageneralresoure
1
LGPL liense; https://soureforge.net/
projets/medslt/
2
LGPL liense; https://soureforge.net/
a seedorpusofdomain-spei examples. The
seed orpus, whih typially ontains between
500 and 1500 utteranes, is then used a seond
timetoadd probabilistiweightstothegrammar
rules;thissubstantiallyimprovesreognitionper-
formane(Rayneretal.,2006,x11.5).
Atrun-time,thereogniserproduesasoure-
language semanti representation in AFF (Al-
mostFlatFuntional Semantis; (Bouillon etal.,
2008a)). This is rst translated by one set of
rulesintoaninterlingualform,andthenbyase-
ondsetintoatargetlanguagerepresentation. The
interlingua and target representation are also in
AFFform. Atarget-languageRegulus grammar,
ompiled into generation form, turns the target
representation into one or more possible surfae
strings, after whih a set of generation prefer-
enespiksoneout.
Inparallel,theinterlinguaisalsotranslated,us-
ing the same methods, into the soure-language
(baktranslated). Thebaktranslationisshown
to the soure-language user, who has the op-
tion of aborting proessing if they onsider that
speeh understanding has produed an inorret
result. If they do not abort, the target language
stringisdisplayed andrealisedasspokenoutput.
Thismode ofoperation isabsolutelyessentialin
a safety-ritial appliation like medial exami-
nation. Sine translation errors an have seri-
ousorevenfatalonsequenes, dotorswillonly
onsider usingsystems withextremelylow error
rates, wheretheyandiretly satisfy themselves
that the system hasatleast orretlyunderstood
whattheyhavesaidbeforeattemptingtotranslate
it. Thisalso motivates use ofrestrited-domain,
asopposedtogeneraltranslation.
Thespaeofwell-formedinterlinguarepresen-
tationsinMedSLTisdenedbyyetanotherReg-
ulusgrammar(Bouillonetal.,2008a);thisgram-
mar is designed to have minimal struture, so
heking for well-formedness an be performed
very quikly. During speeh reognition, the
well-formedness hek is used as a knowledge
soure to enhane the language model for the
soure language. The speeh reogniser is set
to generate N-best reognition hypotheses, and
hypotheses whih give rise to non-wellformed
Enginterlinguagloss YN-QUESTIONpainlastPRESENTusuallydurationmore-thanoneday
Frenh ladouleurdure-t-ellehabituellementplusd'unjour
Japinterlinguagloss more-thanonedaydurationpainusuallylastPRESENTYN-QUESTION
Japanese daitaiihinihisukunakutomoitamiwatsuzukimasuka
English doesiteverappearwhenyoueat
Enginterlinguagloss YN-QUESTIONyouhavePRESENTeverpains-whenyoueatPRESENT
Frenh avez-vousd´ejaeumalquandvousmangez
Japinterlinguagloss eatPRESENTs-wheneverpainhavePRESENTYN-QUESTION
Japanese koremadenitabemonowotaberutoitamimashitaka
English isthepainononeside
Enginterlinguagloss YN-QUESTIONyouhavePRESENTpainin-loheadoneside-part
Frenh avez-vousmalsurl'undesot´ esdelatete
Japinterlinguagloss headoneside-partin-lopainhavePRESENTYN-QUESTION
Japanese atamanokatagawawaitamimasuka
Table 1: English MedSLT examples: English soure sentene, English-format interlingua gloss,
rule-based translation into Frenh, Japanese-format interlingua gloss and rule-based translation into
Japanese.
this highest-in-overage resoring algorithm is
foundtoreduesemantierrorrateduringspeeh
understanding by about 10% relative (Bouillon
etal.,2008b).
Theinterlinguagrammarisbuiltinsuhaway
that thesurfaeformsitdenes analsobe used
ashuman-readableglosses. Wewill makeheavy
use ofthese glosses in whatfollows. The usual
formoftheinterlingua glosslanguageismod-
elled on English. It is, however, straightfor-
wardtoparametrizethegrammarso thatglosses
an alsobegeneratedwithword-ordersbasedon
those ourring inotherlanguages; inpartiular,
wewillalsouseonebasedonJapanese.
Figure 1 shows examples of English domain
sentenes together with translations into Frenh
and Japanese and interlingua glossesinEnglish-
based andJapanese-based format. Notethevery
simple struture of the interlingua gloss, whih
is inmost asesjust a onatenation oftextrep-
resentations for the underlying AFF representa-
tion; sine AFF representations are unordered
lists, they anbe presented inanydesired order.
ThustheAFFfortherstexample,doesthepain
usuallylastformorethanonedayisthefollow-
ingstruture:
3
3
AFFrepresentationsandglosseshavebeenslightlysim-
[null=[utt_type,ynq℄,
arg1=[symptom,pain℄,
null=[state,last℄,
null=[tense,present℄,
null=[freq,usually℄,
duration=[>=,1℄,
duration=[timeunit,day℄℄
The English-format interlingua gloss, YN-
QUESTION pain last PRESENT usually dura-
tion more-thanone daypresentsthese elements
in the order given here, whih is approximately
thatofanormalEnglishrenditionofthesentene.
In ontrast, the Japanese-format gloss, more-
thanonedaydurationpainusuallylastPRESENT
YN-QUESTIONmakesonessions tostandard
Japaneseword-order, inwhihthe sentenenor-
mallyendswiththeverb(here,tsuzukimasu),fol-
lowedbytheinterrogativepartileka.
Similarly,intheseondexamplefromTable1,
we see that the English-format gloss puts s-
when (subordinating- onj un ti on when) be-
foretherepresentationof thesubordinatelause;
the Japanese-format gloss puts s-when after,
mirroringthefatthattheorrespondingJapanese
partile, to, omes after the subordinate lause
tabemonowotaberu. Thisisliterally foodOBJ
eat,i.e.(you)eatfood;notethattheJapanese-
age.InSetion3.2,wewilldemonstratehowuse-
fulthedifferentformsoftheinterlinguaturn out
tobe. Thebasipointistobeabletosplitupsta-
tistial translation into piees where soure and
targetalwayshavesimilarword-order.
Alltheexperimentsdesribedintherestofthe
paper were arried out using the 870-utterane
reorded speeh orpus from (Rayner et al.,
2005); this was olleted using a protool in
whihsubjetsplayedthedotorroleinsimulated
medial examinationsarriedoutusingtheMed-
SLTprototype. Atransribed versionofthedata
an be found online at http://medslt.
vs.soureforge.net/viewv/
*
hekout
*
/medslt/MedSLT2/
orpora/al2005 transriptions.
txt?revision=1.1. A brief examination
of the orpus shows that it is fairly noisy. We
estimate that about 6570% of it onsists of
learly in-domain and well-formed sentenes,
depending on the exat denitions of these
terms 4
,withmuhoftheremainingportionbeing
out-of-domainordysuent.
The next setion presents the results of ear-
lierexperiments, inwhihstatistial omponents
werebootstrappedbyusingtherule-basedonesto
reatetrainingdata.
3 Previousexperiments
3.1 Bootstrappingstatistiallanguage
models
As desribed inSetion 2, theRegulus platform
onstruts grammar-based language models ina
orpus-driven way. This, inpriniple, enablesa
fairomparisonbetweengrammar-basedandsta-
tistial languagemodelling, sine the seed or-
pususedtoextratthespeialisedgrammaran
also beusedtotrainastatistial languagemodel
(SLM).There are, however, several ways toim-
plement this idea. The simplest method is to
use the seed orpus diretly as a training or-
pusfor theSLM.Amore subtleapproah is de-
sribed in(Jurafsky etal., 1995; Jonson, 2005);
oneanrandomlysamplethegrammar-basedlan-
guagemodeltogeneratearbitrarilylargeamounts
4
61%oftheorpusiswithintheoverageoftheurrent
SLMtrainingproess.
In(Hokeyetal.,2008),weshowedthatasta-
tistialreognisertrainedfromasufientlylarge
randomly generated orpus outperforms the one
generatedfromtheseedorpus 5
.Afurtherrene-
ment is to lter the randomly generated orpus
bykeepingonlyexampleswhih,whentranslated
intointerlingua glossform,resultinwell-formed
representations. These improvements yielded a
umulative redution in Word Error Rate, mea-
suredoverthewhole870-utteranedataset,from
27.7% to23.6%. The best bootstrapped statisti-
al reogniser was, however, still inferior tothe
grammar-basedone,whihsored22.0%.
3.2 Bootstrappingstatistialtranslation
models
In (Rayner et al., 2010), we adapted the meth-
ods fromSetion3.1tobootstrap StatistialMa-
hine Translation (SMT) models from the orig-
inal rule-based ones; a similar experiment, with
alarge-voabulary system,isreportedin(Dugast
etal., 2008). As above, westarted by using the
soure-language grammar to randomly generate
a large orpus of data. We then passed the re-
sult through English ! Frenh and English !
Japaneseversions of theinterlingua-based trans-
lation omponents, savingthe soure, target and
interlingua gloss representations. A straightfor-
wardwaytoreatetheSMTmodelswouldbeto
use thealigned soure/target orpora astraining
data. Here,however,weagainshowedthatitwas
possible to get muh better performane by ex-
ploitingthestrutureoftheinterlingua.
The interlingua gloss was saved both in the
English-based and the Japanese-based formats
(f. Table 1). We then used the ommon om-
binationofGiza++,MosesandSRILM(Ohand
Ney,2000;Koehnetal.,2007; Stolke,2002)to
trainseparate SMTmodels for thepairsEnglish
!English-formatinterlingua,English-formatin-
terlingua ! Frenh, and Japanese-format inter-
lingua!Japanese;foromparisonpurposes,we
alsotrainedmodelsforEnglish!FrenhandEn-
glish ! Japanese. All the models were tuned
using MERT (Oh, 2003) on a held-out portion
of data. We experimented with several differ-
5
Version1 Version2 Dataset Judge1 Judge2 Unanimous
English!Frenh
1 Rule-based Bootstrappedstatistial Alldata 26143 25943 24733
2 Rule-based Bootstrappedstatistial Onlygoodbaktrans. 6925 7127 6220
3 Hybrid Rule-based Alldata 29180 30181 25177
4 Hybrid Rule-based Onlygoodbaktrans. 1812 1915 1512
English!Japanese
5 Rule-based Bootstrappedstatistial Alldata 12598 14796 10167
6 Rule-based Bootstrappedstatistial Onlygoodbaktrans. 6125 6641 4921
7 Hybrid Rule-based Alldata 4962 3081 2355
8 Hybrid Rule-based Onlygoodbaktrans. 178 199 148
Table 2: Comparisons between differentversions of the English! Frenh and English! Japanese
MedSLTsystems. Theresult NNMMindiatesthat thejudge(s)inquestiononsidered thattherst
versiongavealearlybetterresultNNtimes,andtheseondversionalearlybetterresultMMtimes.
DifferenessigniantatP <0:05aordingtotheMNemartestaremarkedinbold.
entwaysofombining theseresoures. Thebest
methodturnedouttobethefollowingpipeline:
1. Translation fromEnglish to English-format
interlingua usingSMT,withthedeoderset
toprodueN-bestoutput(N wassetto15);
2. ResoringoftheN-bestoutputtohoosethe
highest well-formed string, where one was
available;
3. IfthetargetisJapanese,reformulation from
English-format interlingua to Japanese-
formatinterlingua;
4. Translation from the appropriate format of
interlinguatothetargetlanguageusingSMT
As shown in the paper, this ombination mas-
sively dereases the error rate for the difult
pair English! Japanese, omparedtothena¨ve
method of training a single SMT model. The
key advantage is that SMTtranslation, whih is
very sensitive todifferenes in word-order, only
has to translate between languages with similar
word-orders. EvenintherelativelyeasypairEn-
glish ! Frenh, a substantial performane gain
was ahieved by interposing the N-best resor-
ingstep. Onin-overageinput,bothbootstrapped
interlingua-based SMTsystems wereable tore-
produethetranslationsoftheoriginalrule-based
systems on about 79% of the data; the orre-
were 67% for English ! Frenh and 27% for
English ! Japanese. In ases where the boot-
strapped SMT output differed from the RBMT
one,hand-examinationshowedthattheSMTver-
sionwashardly everbetter,and wasoftenworse
(Rayneretal.,2009).
The bootstrapped SMT systems are thus not
quite asgood asthe original RBMToneson in-
overage data. The payoff, of ourse, is that
the bootstrapped system are also able to trans-
late out-of-overage sentenes. When evaluated
ontheout-of-overageportionofthetestset(358
textutteranes), 81 sentenes (23%)produed a
baktranslation judged to be orret. Of these
81 sentenes, 76 (94%) were judged toprodue
good translations for Frenh, and 71 (88%) for
Japanese.
4 Combiningreognitionand
translation
Thepreedingsetionshaveshownhowwewere
able to use Open Soure resoures to bootstrap
goodrobustversionsoftheoriginalspeehreog-
nitionandmahinetranslationomponents,using
only the original, very small training set of 948
sentenes. We now desribe how we ombined
these modules to ompare a full bootstrapped
statistial speeh translation system against the
orginalrule-basedone;wealsoomparetherule-
Wetook thebest versions ofthebootstrapped
statistial reogniser from Setion 3.1 and the
bootstrapped statistial translation models from
Setion 3.2, ran the 870-utterane speeh or-
pusfrom(Rayneretal.,2005)through them,and
ompared the results with those obtained from
theoriginal speehtranslation system(grammar-
based reognitionand rule-basedtranslation). In
bothongurations,wealsoproduedrule-based
baktranslations (f. Setion 2), in order to be
abletosimulatenormaluseofthesystem.
Thematerialwasannotatedbyhumanjudgesin
thefollowingway. TheEnglish!Englishbak-
translations were evaluated by a native English
judge; they wereasked tomark the baktransla-
tion asgood ifthey were sufiently sure of its
orretness that they would have onsidered, in
arealmedialexaminationdialogue,thatthesys-
temhadunderstoodandshouldbeallowedtopass
itstranslationontothepatient.
The English ! Frenh and English !
Japanese translations were evaluatedby two na-
tive speakers ofFrenh and two native speakers
of Japanese respetively, who were all uent in
English. Theywerepresentedwithaspreadsheet
ontaining three olumns, inwhih therst ol-
umn was the soure English sentene, and the
other two were the output of the orginal rule-
based system andthe output ofthebootstrapped
system. If one ofthe systems produed no out-
put,forwhateverreason,thiswasmarkedasNO
TRANSLATION.The order of presentation of
thetwosystemswasrandomised,sothatthejudge
did not know, for any given line, whih version
wasshownintheseondolumnandwhihinthe
third. If there were two translations, the judges
wereinstrutedtomarkone ofthemifthey on-
sidered that it was learly superior to the other.
If one of thetranslations wasnull they werein-
strutedtomarkthenon-nulltranslationasprefer-
able ifthey onsideredthat itwouldbe usefulin
theontextofthemedialspeehtranslationtask.
We used the data and the judgments to om-
paretherule-basedsystems,thebootstrappedsta-
tistial systems, and a hypothetial hybrid sys-
tem whih produes the result from the boot-
strappedsystemiftherule-basedsystemprodues
tem. Theresults are summarisedinTable2; we
present gures for eah omparison both on the
ompletedataset,andalsoonthesubsetforwhih
baktranslationproduedaresultjudgedasgood.
The last three olumns give the results rst for
eah judge separately, then for the ases where
thetwojudgementsoinide.
Although statistial proessing, asusual, adds
robustness, we an see that it suffers from two
majorproblems. Aslines1 and5 show,the sta-
tistial system, withoutbaktranslation, is muh
worsethantherule-basedone,sineitfrequently
produes inorrettranslationsdue tobadreog-
nition. (Thestatistialsystemalmostalwayspro-
duesatranslation;therule-basedonefailstodo
so about on about 30% of the data, sine rule-
based reognition most often failsaltogether on
out-of-overage data, as opposed toproduing a
nonsensial result). Withbaktranslation added,
lines 2 and 6 at least demonstrate that this rst
problemdisappears,andtheresultisloser.How-
ever,westill havetheseond problem;there are
long-distane dependenies whih the statistial
algorithms are unable to learn. For example, in
Frenh, both judges agreed that there were 62
ases whererule-based proessing gave a better
resultthanstatistial,mostlyduetomoreaurate
reognition or translation. There were 20 ases
whihwenttheoppositeway,withstatistialpro-
essing better thanrule-based: inmostof these,
rule-based proessing gave no result, and statis-
tial a good result. Forboth languagepairs, the
guressuggestthatthelakoflong-distaneon-
straintsismore importantthantheaddedrobust-
ness.
Theresultsfrom(Rayneretal.,2010)ledusto
hopethatthehybridsystemwouldaddrobustness
to the rule-based system without ompromising
auray;(Seneffetal.,2006)reportsasimilarre-
sultwhenthetextomponentofaspeehtransla-
tionsystemisevaluatedinisolation.Combination
with the speeh reognition front-end, with its
onomitantnoisyinput,unfortunatelyappearsto
hangethepiture.Withoutbaktranslation(lines
3and6),thehybridsystemisinferiortotherule-
basedoneforthereasonswehavealreadyseen.
When baktranslation is inluded(lines 4 and
lossofpreision.Examinationoftheaseswhere
the rule-based system diverges from the hybrid
one shows disturbing examples where the rule-
based system produes no output, and the hy-
brid one an output whih is meaningful but in-
orret. Forinstane, Doyoutakemediinefor
yourheadahes? produed notranslation inthe
rule-based English ! Frenh system, but Avez-
vous vos maux de tete quand vous prenez des
m´ediaments? (Do you have headahes when
you take mediine?) inthe hybrid one; a mis-
takewhihwouldertainlyworryanydotorwho
usedthesystem!
5 Summaryandonlusions
We have desribed a series of experiments in
whihwestartedwitharule-basedspeehtransla-
tion systemforamedial speehtranslation sys-
tem,andusedittobootstrapaorrespondingsta-
tistialsystem. Therule-basedsystemisstillbet-
ter than the statistial one, despite the fat that
onsiderable ingenuity hasbeen invested intun-
ing both the reognition and translation ompo-
nents.
Thena¨vehybridsystemgaveasmallimprove-
mentinreall,butatanunaeptableostinpre-
ision. Itis oneivable that a more subtle way
ofreatingthehybridsystemmaystillsueedin
addingusefulrobustness.Atthemoment,though,
the evidene at our disposal suggests that rule-
based systems are more appropriate forthe kind
of task, and that any gain from adding robust
methodsisatbestlikelytobesmall.
Wearewellawarethatourresultisatoddswith
theurrentlyprevailingwisdom,namelythatsta-
tistialmethodsarepreferabletorule-basedones,
and the obvious question is why this should be.
Wethinktherearetwomainreasons. First,most
aademi papers are written about systems that
havebeenreatedtoaddressasharedtask.These
tasks typially uselarge training sets that repre-
sent a substantial investment intime and effort.
When building real world appliations, it is un-
usualtobegivenalargetrainingsetatthestartof
theprojet; itismuh more ommontohaveno
trainingsetatall.
Theseondreasonisthatmedialspeehtrans-
needstobe reetedintheevaluationmetri. A
metri whih maximizes BLEU sore or reall,
typialofmosturrentevaluations,isinappropri-
ate. Nodotorwehavetalked towouldonsider
BLEUausefulmetri.
Inbothrespets,theappliationwedesribeis
loser toreal world onesthan isommon inthe
literature,andwethereforethinkitreasonableto
laimthat ourresultsshould notbedismissed as
irrelevant; wesuspet that similarproblems will
emerge in many other real world appliations.
TheOpenSoureframeworkwehaveusedmake
iteasy for septialresearhers tohek thede-
tailsofourmethodsanddata.
Referenes
Bouillon, P., Flores, G., Georgesul, M., Hal-
imi, S., Hokey, B., Isahara, H., Kanzaki, K.,
Nakao, Y., Rayner, M., Santaholma,M., Star-
lander, M.,andTsourakis, N.(2008a). Many-
to-many multilingual medial speeh transla-
tion on aPDA. InProeedings ofThe Eighth
Conferene of the Assoiation for Mahine
TranslationintheAmerias,Waikiki,Hawaii.
Bouillon, P., Halimi, S., Nakao, Y., Kanzaki,
K., Isahara, H.,Tsourakis, N.,Starlander, M.,
Hokey, B., and Rayner, M. (2008b). De-
veloping non-European translation pairs in a
medium-voabularymedialspeehtranslation
system. In Proeedings ofLREC 2008, Mar-
rakesh,Moroo.
Dugast, L., Senellart, J., and Koehn, P. (2008).
CanwerelearnanRBMTsystem? InProeed-
ings oftheThirdWorkshopon Statistial Ma-
hine Translation, pages 175178, Columbus,
Ohio.
Hokey, B., Rayner, M., and Christian, G.
(2008). Training statistial language models
from grammar-generated data: Aomparative
ase-study. InProeedingsofthe6thInterna-
tional Conferene on Natural Language Pro-
essing,Gothenburg,Sweden.
Jonson, R. (2005). Generating statistial lan-
guagemodelsfrominterpretation grammarsin
dialogue systems. InProeedingsof the11th
A., Fosler, E., Tajhman, G., and Morgan, N.
(1995). Usinga stohastiontext-free gram-
mar as a language model for speeh reogni-
tion. InProeedingsoftheIEEEInternational
Conferene on Aoustis, Speeh and Signal
Proessing,pages189192.
Koehn,P.,Hoang,H.,Birh,A.,Callison-Burh,
C., Federio, M., Bertoldi, N., Cowan, B.,
Shen, W., Moran, C., Zens, R., et al. (2007).
Moses: Opensoure toolkit forstatistial ma-
hine translation. In ANNUAL MEETING-
ASSOCIATIONFORCOMPUTATIONALLIN-
GUISTICS,volume45,page2.
Oh, F. (2003). Minimum error rate training
instatistial mahine translation. InProeed-
ingsofthe41stAnnualMeetingonAssoiation
forComputationalLinguistis-Volume1,pages
160167. Assoiation for Computational Lin-
guistis.
Oh, F.andNey,H.(2000). Improved statistial
alignmentmodels. InProeedings ofthe38th
AnnualMeetingoftheAssoiationforCompu-
tationalLinguistis,HongKong.
Rayner, M., Bouillon, P., Chatzihrisas, N.,
Hokey, B., Santaholma, M., Starlander, M.,
Isahara,H.,Kanzaki,K.,andNakao,Y.(2005).
Amethodologyforomparinggrammar-based
and robust approahes to speeh understand-
ing. In Proeedings of the 9th International
Conferene on Spoken Language Proessing
(ICSLP),pages11031107,Lisboa,Portugal.
Rayner,M.,Estrella, P.,and Bouillon,P.(2010).
A bootstrapped interlingua-based SMTarhi-
teture. In Proeedings of the 14th Confer-
eneoftheEuropeanAssoiationforMahine
Translation(EAMT),StRaphael,Frane.
Rayner,M.,Estrella,P.,Bouillon,P.,Hokey,B.,
andNakao,Y.(2009). Usingartiiallygener-
ateddata toevaluate statistial mahinetrans-
lation. InProeedings ofthe 2009 Workshop
onGrammarEngineeringArossFrameworks,
pages5462,Singapore.AssoiationforCom-
putationalLinguistis.
Rayner,M.,Hokey,B.,and Bouillon,P.(2006).
Chiago.
Seneff,S., Wang,C., and Lee, J. (2006). Com-
bining linguistiandstatistialmethodsforbi-
diretional English Chinese translation in the
ightdomain. InProeedingsofAMTA2006.
Stolke, A.(2002). SRILM- an extensible lan-
guage modeling toolkit. In Seventh Interna-
tional Conferene on Spoken Language Pro-
essing.ISCA.