Bootstrapping A Statistical Speech Translator From A Rule-Based One

(1)

Proceedings Chapter

Reference

Bootstrapping A Statistical Speech Translator From A Rule-Based One

RAYNER, Emmanuel, ESTRELLA, Paula Susana, BOUILLON, Pierrette

RAYNER, Emmanuel, ESTRELLA, Paula Susana, BOUILLON, Pierrette. Bootstrapping A Statistical Speech Translator From A Rule-Based One. In: Proceedings of the 2nd

International Workshop On Free/Open-Source Rule-Based Machine Translation . 2011.

Available at:

http://archive-ouverte.unige.ch/unige:14930

Disclaimer: layout of this document may differ from the published version.

1 / 1

(2)

MannyRayner

UniversityofGeneva,TIM/ISSCO,

40bvdduPont-d'Arve

CH-1211Geneve,Switzerland

Emmanuel.Raynerunige.h

PaulaEstrella

FaMAF,U.NaionaldeC´ordoba

5000-C´ordoba,Argentine

pestrellafamaf.un.edu.ar

PierretteBouillon

UniversityofGeneva,TIM/ISSCO,

40bvdduPont-d'Arve

CH-1211Geneve,Switzerland

Pierrette.Bouillonunige.h

Abstrat

We desribea seriesofexperiments in

whihwestartwithEnglish!Frenh

andEnglish!Japaneseversionsofan

Open Soure rule-based speeh trans-

lation system for a medial domain,

and bootstrap orresponding statistial

systems. Comparative evaluation re-

veals that the rule-based systems are

still signiantly betterthan the statis-

tial ones,despite thefat that onsid-

erable effort has been invested intun-

ingboththereognitionandtranslation

omponents;also,ahybridsystemonly

marginally improved reall atthe ost

ofa loss inpreision. The result sug-

gests that rule-basedarhitetures may

stillbepreferabletostatistialonesfor

safety-ritialspeehtranslationtasks.

IndexTerms:Speehtranslation,rule-basedpro-

essing, statistial proessing, bootstrapping, in-

terlingua,evaluation

1 Introdution

Thispaperdesribesaontinuationofaseriesof

experiments entered around MedSLT (Bouillon

et al., 2008a), an Open Soure medial speeh

translator designed for dotor-patient ommuni-

ation whih uses a rule-based arhiteture; the

purpose of the experiments has been to om-

pare this arhiteturewith moremainstream sta-

tistial ones. The original motivation for using

regardingthe tradeoff between preision and re-

all. Speially,medial speehtranslation isa

safety-ritial domain, where preision is muh

moreimportantthanreall.Itisalsoimportantto

notethatthisisadomainwheresubstantialquan-

titiesof training data areunavailable. The ques-

tionishowtousetheverylimitedamountsofdata

atourdisposaltobesteffet. Thisisbynomeans

anunommonsenarioinlimited-domain speeh

translation, and ould in fat be regarded asthe

normratherthantheexeption.

Itisintuitivelynotunreasonabletobelievethat

rule-based methods are better suited to the re-

quirements outlined above, but the well-known

methodologialproblems involvedinperforming

omparisons between rule-based and statistial

systems havemade ithardtoestablishthis point

unambiguously. Inanearlierstudy(Rayneretal.,

2005), we presented head-to-head omparisons

between MedSLTand analternative whihom-

binedstatistial reognition andan adho trans-

lation mehanism based on hand-oded surfae

patterns, showingthattherule-basedsystemper-

formedomfortablybetter. Itwas,however,lear

fromomments wereeived thatthe ommunity

viewedthese results septially. The basi riti-

ismwasthat the robust proessing omponents

were too muh of a straw-man: more powerful

reognitionortranslationengines mightoneiv-

ablyhavereversedtheresult.

In the new series of experiments, our basi

goal has been to start with the rule-based om-

ponents and the orpus data used to onstrut

them, and then usethe same resoures, together

(3)

weadaptedand improvedmethodsoriginally de-

sribed in (Jurafsky et al., 1995) to bootstrap a

statistial reogniserfromtheoriginalrule-based

one. More reently, in(Rayner etal., 2010)we

used similarmethods tobootstrap statistial ma-

hinetranslationmodels.

In this urrent paper, we ombine the results

of theprevious two sets of experiments tobuild

a fullybootstrappedstatistial speehtranslation

system, whih we then ompare with the origi-

nalrule-basedone,andalsowithahybridsystem

whih ombines rule-based and statistial pro-

essing. Interestingly, although (Rayner et al.,

2010)demonstratedthatabootstrappedstatistial

mahinetranslationsystemisabletoaddsubstan-

tialrobustnesstotheoriginalrule-basedonewhen

botharerunontextdata,thisrobustnessdoesnot

arryovertospeehtranslation.

The rest ofthe paper is organised as follows.

Setion 2 presents bakground on the MedSLT

system; Setion 3 summarises the earlier exper-

imentsonbootstrappedstatistialreognitionand

mahine translation;Setion4desribes thenew

experiments;andSetion5onludes.

2 Bakground:theMedSLTSystem

MedSLT (Bouillon et al., 2008a) is a medium-

voabulary interlingua-based Open Soure 1

speeh translation system for dotor-patient

medial examination questions, whih provides

any-language-to-any- lan gu ag e translation a-

pabilities for all languages in the set fEnglish,

Frenh, Japanese, Arabi, Catalang. In what

follows, however, we will only be onerned

with thepairsEnglish! Frenhand English!

Japanese, whih we take, respetively, asrepre-

sentativeofaloseanddistantlanguage-pair.

Both speeh reognition and translation are

rule-based. Speehreognitionrunsontheom-

merial Nuane 8.5 reognition platform, with

grammar-based language models built using the

OpenSoure 2

Regulusompiler. Asdesribedin

(Rayner et al., 2006), eah domain-spei lan-

guagemodelisextratedfromageneralresoure

1

LGPL liense; https://soureforge.net/

projets/medslt/

2

LGPL liense; https://soureforge.net/

a seedorpusofdomain-spei examples. The

seed orpus, whih typially ontains between

500 and 1500 utteranes, is then used a seond

timetoadd probabilistiweightstothegrammar

rules;thissubstantiallyimprovesreognitionper-

formane(Rayneretal.,2006,x11.5).

Atrun-time,thereogniserproduesasoure-

language semanti representation in AFF (Al-

mostFlatFuntional Semantis; (Bouillon etal.,

2008a)). This is rst translated by one set of

rulesintoaninterlingualform,andthenbyase-

ondsetintoatargetlanguagerepresentation. The

interlingua and target representation are also in

AFFform. Atarget-languageRegulus grammar,

ompiled into generation form, turns the target

representation into one or more possible surfae

strings, after whih a set of generation prefer-

enespiksoneout.

Inparallel,theinterlinguaisalsotranslated,us-

ing the same methods, into the soure-language

(baktranslated). Thebaktranslationisshown

to the soure-language user, who has the op-

tion of aborting proessing if they onsider that

speeh understanding has produed an inorret

result. If they do not abort, the target language

stringisdisplayed andrealisedasspokenoutput.

Thismode ofoperation isabsolutelyessentialin

a safety-ritial appliation like medial exami-

nation. Sine translation errors an have seri-

ousorevenfatalonsequenes, dotorswillonly

onsider usingsystems withextremelylow error

rates, wheretheyandiretly satisfy themselves

that the system hasatleast orretlyunderstood

whattheyhavesaidbeforeattemptingtotranslate

it. Thisalso motivates use ofrestrited-domain,

asopposedtogeneraltranslation.

Thespaeofwell-formedinterlinguarepresen-

tationsinMedSLTisdenedbyyetanotherReg-

ulusgrammar(Bouillonetal.,2008a);thisgram-

mar is designed to have minimal struture, so

heking for well-formedness an be performed

very quikly. During speeh reognition, the

well-formedness hek is used as a knowledge

soure to enhane the language model for the

soure language. The speeh reogniser is set

to generate N-best reognition hypotheses, and

hypotheses whih give rise to non-wellformed

(4)

Enginterlinguagloss YN-QUESTIONpainlastPRESENTusuallydurationmore-thanoneday

Frenh ladouleurdure-t-ellehabituellementplusd'unjour

Japinterlinguagloss more-thanonedaydurationpainusuallylastPRESENTYN-QUESTION

Japanese daitaiihinihisukunakutomoitamiwatsuzukimasuka

English doesiteverappearwhenyoueat

Enginterlinguagloss YN-QUESTIONyouhavePRESENTeverpains-whenyoueatPRESENT

Frenh avez-vousd´ejaeumalquandvousmangez

Japinterlinguagloss eatPRESENTs-wheneverpainhavePRESENTYN-QUESTION

Japanese koremadenitabemonowotaberutoitamimashitaka

English isthepainononeside

Enginterlinguagloss YN-QUESTIONyouhavePRESENTpainin-loheadoneside-part

Frenh avez-vousmalsurl'undesot´ esdelatete

Japinterlinguagloss headoneside-partin-lopainhavePRESENTYN-QUESTION

Japanese atamanokatagawawaitamimasuka

Table 1: English MedSLT examples: English soure sentene, English-format interlingua gloss,

rule-based translation into Frenh, Japanese-format interlingua gloss and rule-based translation into

Japanese.

this highest-in-overage resoring algorithm is

foundtoreduesemantierrorrateduringspeeh

understanding by about 10% relative (Bouillon

etal.,2008b).

Theinterlinguagrammarisbuiltinsuhaway

that thesurfaeformsitdenes analsobe used

ashuman-readableglosses. Wewill makeheavy

use ofthese glosses in whatfollows. The usual

formoftheinterlingua glosslanguageismod-

elled on English. It is, however, straightfor-

wardtoparametrizethegrammarso thatglosses

an alsobegeneratedwithword-ordersbasedon

those ourring inotherlanguages; inpartiular,

wewillalsouseonebasedonJapanese.

Figure 1 shows examples of English domain

sentenes together with translations into Frenh

and Japanese and interlingua glossesinEnglish-

based andJapanese-based format. Notethevery

simple struture of the interlingua gloss, whih

is inmost asesjust a onatenation oftextrep-

resentations for the underlying AFF representa-

tion; sine AFF representations are unordered

lists, they anbe presented inanydesired order.

ThustheAFFfortherstexample,doesthepain

usuallylastformorethanonedayisthefollow-

ingstruture:

3

AFFrepresentationsandglosseshavebeenslightlysim-

[null=[utt_type,ynq℄,

arg1=[symptom,pain℄,

null=[state,last℄,

null=[tense,present℄,

null=[freq,usually℄,

duration=[>=,1℄,

duration=[timeunit,day℄℄

The English-format interlingua gloss, YN-

QUESTION pain last PRESENT usually dura-

tion more-thanone daypresentsthese elements

in the order given here, whih is approximately

thatofanormalEnglishrenditionofthesentene.

In ontrast, the Japanese-format gloss, more-

thanonedaydurationpainusuallylastPRESENT

YN-QUESTIONmakesonessions tostandard

Japaneseword-order, inwhihthe sentenenor-

mallyendswiththeverb(here,tsuzukimasu),fol-

lowedbytheinterrogativepartileka.

Similarly,intheseondexamplefromTable1,

we see that the English-format gloss puts s-

when (subordinating- onj un ti on when) be-

foretherepresentationof thesubordinatelause;

the Japanese-format gloss puts s-when after,

mirroringthefatthattheorrespondingJapanese

partile, to, omes after the subordinate lause

tabemonowotaberu. Thisisliterally foodOBJ

eat,i.e.(you)eatfood;notethattheJapanese-

(5)

age.InSetion3.2,wewilldemonstratehowuse-

fulthedifferentformsoftheinterlinguaturn out

tobe. Thebasipointistobeabletosplitupsta-

tistial translation into piees where soure and

targetalwayshavesimilarword-order.

Alltheexperimentsdesribedintherestofthe

paper were arried out using the 870-utterane

reorded speeh orpus from (Rayner et al.,

2005); this was olleted using a protool in

whihsubjetsplayedthedotorroleinsimulated

medial examinationsarriedoutusingtheMed-

SLTprototype. Atransribed versionofthedata

an be found online at http://medslt.

vs.soureforge.net/viewv/

*

hekout

*

/medslt/MedSLT2/

orpora/al2005 transriptions.

txt?revision=1.1. A brief examination

of the orpus shows that it is fairly noisy. We

estimate that about 6570% of it onsists of

learly in-domain and well-formed sentenes,

depending on the exat denitions of these

terms 4

,withmuhoftheremainingportionbeing

out-of-domainordysuent.

The next setion presents the results of ear-

lierexperiments, inwhihstatistial omponents

werebootstrappedbyusingtherule-basedonesto

reatetrainingdata.

3 Previousexperiments

3.1 Bootstrappingstatistiallanguage

models

As desribed inSetion 2, theRegulus platform

onstruts grammar-based language models ina

orpus-driven way. This, inpriniple, enablesa

fairomparisonbetweengrammar-basedandsta-

tistial languagemodelling, sine the seed or-

pususedtoextratthespeialisedgrammaran

also beusedtotrainastatistial languagemodel

(SLM).There are, however, several ways toim-

plement this idea. The simplest method is to

use the seed orpus diretly as a training or-

pusfor theSLM.Amore subtleapproah is de-

sribed in(Jurafsky etal., 1995; Jonson, 2005);

oneanrandomlysamplethegrammar-basedlan-

guagemodeltogeneratearbitrarilylargeamounts

4

61%oftheorpusiswithintheoverageoftheurrent

SLMtrainingproess.

In(Hokeyetal.,2008),weshowedthatasta-

tistialreognisertrainedfromasufientlylarge

randomly generated orpus outperforms the one

generatedfromtheseedorpus 5

.Afurtherrene-

ment is to lter the randomly generated orpus

bykeepingonlyexampleswhih,whentranslated

intointerlingua glossform,resultinwell-formed

representations. These improvements yielded a

umulative redution in Word Error Rate, mea-

suredoverthewhole870-utteranedataset,from

27.7% to23.6%. The best bootstrapped statisti-

al reogniser was, however, still inferior tothe

grammar-basedone,whihsored22.0%.

3.2 Bootstrappingstatistialtranslation

models

In (Rayner et al., 2010), we adapted the meth-

ods fromSetion3.1tobootstrap StatistialMa-

hine Translation (SMT) models from the orig-

inal rule-based ones; a similar experiment, with

alarge-voabulary system,isreportedin(Dugast

etal., 2008). As above, westarted by using the

soure-language grammar to randomly generate

a large orpus of data. We then passed the re-

sult through English ! Frenh and English !

Japaneseversions of theinterlingua-based trans-

lation omponents, savingthe soure, target and

interlingua gloss representations. A straightfor-

wardwaytoreatetheSMTmodelswouldbeto

use thealigned soure/target orpora astraining

data. Here,however,weagainshowedthatitwas

possible to get muh better performane by ex-

ploitingthestrutureoftheinterlingua.

The interlingua gloss was saved both in the

English-based and the Japanese-based formats

(f. Table 1). We then used the ommon om-

binationofGiza++,MosesandSRILM(Ohand

Ney,2000;Koehnetal.,2007; Stolke,2002)to

trainseparate SMTmodels for thepairsEnglish

!English-formatinterlingua,English-formatin-

terlingua ! Frenh, and Japanese-format inter-

lingua!Japanese;foromparisonpurposes,we

alsotrainedmodelsforEnglish!FrenhandEn-

glish ! Japanese. All the models were tuned

using MERT (Oh, 2003) on a held-out portion

of data. We experimented with several differ-

5

(6)

Version1 Version2 Dataset Judge1 Judge2 Unanimous

English!Frenh

1 Rule-based Bootstrappedstatistial Alldata 26143 25943 24733

2 Rule-based Bootstrappedstatistial Onlygoodbaktrans. 6925 7127 6220

3 Hybrid Rule-based Alldata 29180 30181 25177

4 Hybrid Rule-based Onlygoodbaktrans. 1812 1915 1512

English!Japanese

5 Rule-based Bootstrappedstatistial Alldata 12598 14796 10167

6 Rule-based Bootstrappedstatistial Onlygoodbaktrans. 6125 6641 4921

7 Hybrid Rule-based Alldata 4962 3081 2355

8 Hybrid Rule-based Onlygoodbaktrans. 178 199 148

Table 2: Comparisons between differentversions of the English! Frenh and English! Japanese

MedSLTsystems. Theresult NNMMindiatesthat thejudge(s)inquestiononsidered thattherst

versiongavealearlybetterresultNNtimes,andtheseondversionalearlybetterresultMMtimes.

DifferenessigniantatP <0:05aordingtotheMNemartestaremarkedinbold.

entwaysofombining theseresoures. Thebest

methodturnedouttobethefollowingpipeline:

1. Translation fromEnglish to English-format

interlingua usingSMT,withthedeoderset

toprodueN-bestoutput(N wassetto15);

2. ResoringoftheN-bestoutputtohoosethe

highest well-formed string, where one was

available;

3. IfthetargetisJapanese,reformulation from

English-format interlingua to Japanese-

formatinterlingua;

4. Translation from the appropriate format of

interlinguatothetargetlanguageusingSMT

As shown in the paper, this ombination mas-

sively dereases the error rate for the difult

pair English! Japanese, omparedtothena¨ve

method of training a single SMT model. The

key advantage is that SMTtranslation, whih is

very sensitive todifferenes in word-order, only

has to translate between languages with similar

word-orders. EvenintherelativelyeasypairEn-

glish ! Frenh, a substantial performane gain

was ahieved by interposing the N-best resor-

ingstep. Onin-overageinput,bothbootstrapped

interlingua-based SMTsystems wereable tore-

produethetranslationsoftheoriginalrule-based

systems on about 79% of the data; the orre-

were 67% for English ! Frenh and 27% for

English ! Japanese. In ases where the boot-

strapped SMT output differed from the RBMT

one,hand-examinationshowedthattheSMTver-

sionwashardly everbetter,and wasoftenworse

(Rayneretal.,2009).

The bootstrapped SMT systems are thus not

quite asgood asthe original RBMToneson in-

overage data. The payoff, of ourse, is that

the bootstrapped system are also able to trans-

late out-of-overage sentenes. When evaluated

ontheout-of-overageportionofthetestset(358

textutteranes), 81 sentenes (23%)produed a

baktranslation judged to be orret. Of these

81 sentenes, 76 (94%) were judged toprodue

good translations for Frenh, and 71 (88%) for

Japanese.

4 Combiningreognitionand

translation

Thepreedingsetionshaveshownhowwewere

able to use Open Soure resoures to bootstrap

goodrobustversionsoftheoriginalspeehreog-

nitionandmahinetranslationomponents,using

only the original, very small training set of 948

sentenes. We now desribe how we ombined

these modules to ompare a full bootstrapped

statistial speeh translation system against the

orginalrule-basedone;wealsoomparetherule-

(7)

Wetook thebest versions ofthebootstrapped

statistial reogniser from Setion 3.1 and the

bootstrapped statistial translation models from

Setion 3.2, ran the 870-utterane speeh or-

pusfrom(Rayneretal.,2005)through them,and

ompared the results with those obtained from

theoriginal speehtranslation system(grammar-

based reognitionand rule-basedtranslation). In

bothongurations,wealsoproduedrule-based

baktranslations (f. Setion 2), in order to be

abletosimulatenormaluseofthesystem.

Thematerialwasannotatedbyhumanjudgesin

thefollowingway. TheEnglish!Englishbak-

translations were evaluated by a native English

judge; they wereasked tomark the baktransla-

tion asgood ifthey were sufiently sure of its

orretness that they would have onsidered, in

arealmedialexaminationdialogue,thatthesys-

temhadunderstoodandshouldbeallowedtopass

itstranslationontothepatient.

The English ! Frenh and English !

Japanese translations were evaluatedby two na-

tive speakers ofFrenh and two native speakers

of Japanese respetively, who were all uent in

English. Theywerepresentedwithaspreadsheet

ontaining three olumns, inwhih therst ol-

umn was the soure English sentene, and the

other two were the output of the orginal rule-

based system andthe output ofthebootstrapped

system. If one ofthe systems produed no out-

put,forwhateverreason,thiswasmarkedasNO

TRANSLATION.The order of presentation of

thetwosystemswasrandomised,sothatthejudge

did not know, for any given line, whih version

wasshownintheseondolumnandwhihinthe

third. If there were two translations, the judges

wereinstrutedtomarkone ofthemifthey on-

sidered that it was learly superior to the other.

If one of thetranslations wasnull they werein-

strutedtomarkthenon-nulltranslationasprefer-

able ifthey onsideredthat itwouldbe usefulin

theontextofthemedialspeehtranslationtask.

We used the data and the judgments to om-

paretherule-basedsystems,thebootstrappedsta-

tistial systems, and a hypothetial hybrid sys-

tem whih produes the result from the boot-

strappedsystemiftherule-basedsystemprodues

tem. Theresults are summarisedinTable2; we

present gures for eah omparison both on the

ompletedataset,andalsoonthesubsetforwhih

baktranslationproduedaresultjudgedasgood.

The last three olumns give the results rst for

eah judge separately, then for the ases where

thetwojudgementsoinide.

Although statistial proessing, asusual, adds

robustness, we an see that it suffers from two

majorproblems. Aslines1 and5 show,the sta-

tistial system, withoutbaktranslation, is muh

worsethantherule-basedone,sineitfrequently

produes inorrettranslationsdue tobadreog-

nition. (Thestatistialsystemalmostalwayspro-

duesatranslation;therule-basedonefailstodo

so about on about 30% of the data, sine rule-

based reognition most often failsaltogether on

out-of-overage data, as opposed toproduing a

nonsensial result). Withbaktranslation added,

lines 2 and 6 at least demonstrate that this rst

problemdisappears,andtheresultisloser.How-

ever,westill havetheseond problem;there are

long-distane dependenies whih the statistial

algorithms are unable to learn. For example, in

Frenh, both judges agreed that there were 62

ases whererule-based proessing gave a better

resultthanstatistial,mostlyduetomoreaurate

reognition or translation. There were 20 ases

whihwenttheoppositeway,withstatistialpro-

essing better thanrule-based: inmostof these,

rule-based proessing gave no result, and statis-

tial a good result. Forboth languagepairs, the

guressuggestthatthelakoflong-distaneon-

straintsismore importantthantheaddedrobust-

ness.

Theresultsfrom(Rayneretal.,2010)ledusto

hopethatthehybridsystemwouldaddrobustness

to the rule-based system without ompromising

auray;(Seneffetal.,2006)reportsasimilarre-

sultwhenthetextomponentofaspeehtransla-

tionsystemisevaluatedinisolation.Combination

with the speeh reognition front-end, with its

onomitantnoisyinput,unfortunatelyappearsto

hangethepiture.Withoutbaktranslation(lines

3and6),thehybridsystemisinferiortotherule-

basedoneforthereasonswehavealreadyseen.

When baktranslation is inluded(lines 4 and

(8)

lossofpreision.Examinationoftheaseswhere

the rule-based system diverges from the hybrid

one shows disturbing examples where the rule-

based system produes no output, and the hy-

brid one an output whih is meaningful but in-

orret. Forinstane, Doyoutakemediinefor

yourheadahes? produed notranslation inthe

rule-based English ! Frenh system, but Avez-

vous vos maux de tete quand vous prenez des

m´ediaments? (Do you have headahes when

you take mediine?) inthe hybrid one; a mis-

takewhihwouldertainlyworryanydotorwho

usedthesystem!

5 Summaryandonlusions

We have desribed a series of experiments in

whihwestartedwitharule-basedspeehtransla-

tion systemforamedial speehtranslation sys-

tem,andusedittobootstrapaorrespondingsta-

tistialsystem. Therule-basedsystemisstillbet-

ter than the statistial one, despite the fat that

onsiderable ingenuity hasbeen invested intun-

ing both the reognition and translation ompo-

nents.

Thena¨vehybridsystemgaveasmallimprove-

mentinreall,butatanunaeptableostinpre-

ision. Itis oneivable that a more subtle way

ofreatingthehybridsystemmaystillsueedin

addingusefulrobustness.Atthemoment,though,

the evidene at our disposal suggests that rule-

based systems are more appropriate forthe kind

of task, and that any gain from adding robust

methodsisatbestlikelytobesmall.

Wearewellawarethatourresultisatoddswith

theurrentlyprevailingwisdom,namelythatsta-

tistialmethodsarepreferabletorule-basedones,

and the obvious question is why this should be.

Wethinktherearetwomainreasons. First,most

aademi papers are written about systems that

havebeenreatedtoaddressasharedtask.These

tasks typially uselarge training sets that repre-

sent a substantial investment intime and effort.

When building real world appliations, it is un-

usualtobegivenalargetrainingsetatthestartof

theprojet; itismuh more ommontohaveno

trainingsetatall.

Theseondreasonisthatmedialspeehtrans-

needstobe reetedintheevaluationmetri. A

metri whih maximizes BLEU sore or reall,

typialofmosturrentevaluations,isinappropri-

ate. Nodotorwehavetalked towouldonsider

BLEUausefulmetri.

Inbothrespets,theappliationwedesribeis

loser toreal world onesthan isommon inthe

literature,andwethereforethinkitreasonableto

laimthat ourresultsshould notbedismissed as

irrelevant; wesuspet that similarproblems will

emerge in many other real world appliations.

TheOpenSoureframeworkwehaveusedmake

iteasy for septialresearhers tohek thede-

tailsofourmethodsanddata.

Referenes

Bouillon, P., Flores, G., Georgesul, M., Hal-

imi, S., Hokey, B., Isahara, H., Kanzaki, K.,

Nakao, Y., Rayner, M., Santaholma,M., Star-

lander, M.,andTsourakis, N.(2008a). Many-

to-many multilingual medial speeh transla-

tion on aPDA. InProeedings ofThe Eighth

Conferene of the Assoiation for Mahine

TranslationintheAmerias,Waikiki,Hawaii.

Bouillon, P., Halimi, S., Nakao, Y., Kanzaki,

K., Isahara, H.,Tsourakis, N.,Starlander, M.,

Hokey, B., and Rayner, M. (2008b). De-

veloping non-European translation pairs in a

medium-voabularymedialspeehtranslation

system. In Proeedings ofLREC 2008, Mar-

rakesh,Moroo.

Dugast, L., Senellart, J., and Koehn, P. (2008).

CanwerelearnanRBMTsystem? InProeed-

ings oftheThirdWorkshopon Statistial Ma-

hine Translation, pages 175178, Columbus,

Ohio.

Hokey, B., Rayner, M., and Christian, G.

(2008). Training statistial language models

from grammar-generated data: Aomparative

ase-study. InProeedingsofthe6thInterna-

tional Conferene on Natural Language Pro-

essing,Gothenburg,Sweden.

Jonson, R. (2005). Generating statistial lan-

guagemodelsfrominterpretation grammarsin

dialogue systems. InProeedingsof the11th

(9)

A., Fosler, E., Tajhman, G., and Morgan, N.

(1995). Usinga stohastiontext-free gram-

mar as a language model for speeh reogni-

tion. InProeedingsoftheIEEEInternational

Conferene on Aoustis, Speeh and Signal

Proessing,pages189192.

Koehn,P.,Hoang,H.,Birh,A.,Callison-Burh,

C., Federio, M., Bertoldi, N., Cowan, B.,

Shen, W., Moran, C., Zens, R., et al. (2007).

Moses: Opensoure toolkit forstatistial ma-

hine translation. In ANNUAL MEETING-

ASSOCIATIONFORCOMPUTATIONALLIN-

GUISTICS,volume45,page2.

Oh, F. (2003). Minimum error rate training

instatistial mahine translation. InProeed-

ingsofthe41stAnnualMeetingonAssoiation

forComputationalLinguistis-Volume1,pages

160167. Assoiation for Computational Lin-

guistis.

Oh, F.andNey,H.(2000). Improved statistial

alignmentmodels. InProeedings ofthe38th

AnnualMeetingoftheAssoiationforCompu-

tationalLinguistis,HongKong.

Rayner, M., Bouillon, P., Chatzihrisas, N.,

Hokey, B., Santaholma, M., Starlander, M.,

Isahara,H.,Kanzaki,K.,andNakao,Y.(2005).

Amethodologyforomparinggrammar-based

and robust approahes to speeh understand-

ing. In Proeedings of the 9th International

Conferene on Spoken Language Proessing

(ICSLP),pages11031107,Lisboa,Portugal.

Rayner,M.,Estrella, P.,and Bouillon,P.(2010).

A bootstrapped interlingua-based SMTarhi-

teture. In Proeedings of the 14th Confer-

eneoftheEuropeanAssoiationforMahine

Translation(EAMT),StRaphael,Frane.

Rayner,M.,Estrella,P.,Bouillon,P.,Hokey,B.,

andNakao,Y.(2009). Usingartiiallygener-

ateddata toevaluate statistial mahinetrans-

lation. InProeedings ofthe 2009 Workshop

onGrammarEngineeringArossFrameworks,

pages5462,Singapore.AssoiationforCom-

putationalLinguistis.

Rayner,M.,Hokey,B.,and Bouillon,P.(2006).

Chiago.

Seneff,S., Wang,C., and Lee, J. (2006). Com-

bining linguistiandstatistialmethodsforbi-

diretional English Chinese translation in the

ightdomain. InProeedingsofAMTA2006.

Stolke, A.(2002). SRILM- an extensible lan-

guage modeling toolkit. In Seventh Interna-

tional Conferene on Spoken Language Pro-

essing.ISCA.