Some Propositions to Improve the Prediction Capability of Word Confidence Estimation for Machine Translation
Texte intégral
Documents relatifs
This cor- pus contains 2643 speech utterances for which a quintuplet containing: ASR output (src-asr), verbatim transcript (src- ref), text translation output (tgt-mt),
• Source alignment context features: the com- binations of the target word and one word be- fore (left source context) or after (right source context) the source word aligned to it..
– Source alignment context features: the combinations of the target word and one word before (left source context) or after (right source context) the source word aligned to it..
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des
A classical simulation estimator is based on a ratio representation of the mean hitting time, using crude simulation to estimate the numerator and importance sampling to handle
– Step 1: Starting from 25 features, we build 23 subsets, in which 1 contains all features, 1 contains top 10 in Table 1, and 21 sets of 9 randomly extracted features for each. •
This work applies some existing ideas about using lexical and more abstract text representations (morphological, syntactical labels, text bleaching).. It provides a comparison
In this work, we empirically demonstrate that using different types of embeddings on three NLP tasks with twitter data we manage to achieve better or near to the state-of-the