1. INTRODUCTION
1.3. M ETHODOLOGY
The multidisciplinary nature of this research work also affects its methodology, as it varies depending on the objectives to be fulfilled and the hypotheses to be validated in each of the articles. In this section, a general description of the methodology used is presented chronologically in order to clarify the process that has been followed throughout the PhD, and an indication of the article or articles where these methodological tools have been used is added. Two main approaches have been taken: on the one hand, a descriptive-‐analytical approach on the first stages of the thesis, based on a corpus created ad hoc; on the other, an experimental approach including two research studies –one on PE effort and another on QA– and a reception study.
Implementing Machine Translation and Post-Editing to the Translation of Wildlife
Documentaries through Voice-Over and Off-Screen Dubbing 41
(1) Bibliographical review
A bibliographical review on the characteristics of VO and the translation of documentaries through VO and OD was carried out in order to find possible challenges to appear when applying MT and PE MT into the current process of translation of documentaries. Furthermore, another bibliographical review on experiments studying post-‐editing effort and quality assessment was fulfilled to set the basis of the experimental studies that are part of this PhD.
(2) Corpora creation
Thanks to the collaboration of professional translators, documentary scripts in English (108, originals) and Spanish (92, translations) were collected in order to determine the characteristics of original scripts in English and translated scripts in Spanish. They were processed and analysed during a research stay in the translation company Pangeanic (Valencia), which was extremely valuable, as it helped not only with the analysis and process of the scripts but also with a better and more professional understanding of MT engines. The scripts, which contained either only a narrator –to be off-‐screen dubbed– or a narrator with interviewers, interviewees and/or spontaneous speech –to be voiced-‐over–, were divided in three different subcorpora:
♦ corpus of documentary scripts in English (En-‐DOC corpus);
♦ corpus of documentary scripts in Spanish (Spa-‐DOC corpus);
♦ corpus of randomly selected segments from the first corpora along with their human translation in Spanish and the translations produced by 8 MT engines –Apertium, Bing, Google Translate, Lucy MT, Promt, Reverso, Systran, Yandex– (Bil-‐DOC corpus).
More information on these corpora can be found in the first article of this PhD (see chapter 2).
42 Chapter 1. Introduction
(3) Corpora Analysis
The corpora were analysed differently depending on their characteristics:
1. Both En-‐DOC and Spa-‐DOC corpora were firstly analysed according to their micro-‐ and macro-‐structures independently (Van Dijk, 1973). A bottom-‐up approach was used for the macrostructure analysis, which is understood in this study as the layout or overall structures within a documentary. For the analysis of the microstructure, which is known to be in this research as the connections between words and sentences of a documentary that set the basis of its general meaning, a bottom down approach was chosen. Afterwards, the results of En-‐DOC and Spa-‐DOC corpora analyses were compared.
2. The Bil-‐DOC corpus was analysed in three different ways: Firstly, an automatic evaluation of the results using BLEU and TER automatic measures was carried out. Secondly, the output of the MT engines was subjectively assessed by the researcher, who marked all the errors and classified them according to an Error Typology based on Uszkoreit et al's (2013) MQM. Finally, the found errors were analysed and compared.
The previous three steps correspond to the methodology used in the first article presented in the PhD (see chapter 2). After this descriptive stage, an experimental approach with the following steps was taken:
(4) Stimuli creation for the post-‐editing experiments
The selection of the documentary film excerpts to be used for all the experiments that are part of this PhD was based on four requirements:
♦ They had to be a short documentary or part of a long documentary in English; as the longer the documentary, the longer the experiment would be and, time wise, a short experiment was required, as the participants were volunteers.
Implementing Machine Translation and Post-Editing to the Translation of Wildlife
Documentaries through Voice-Over and Off-Screen Dubbing 43
♦ They had to have both narrator and experts talking in order to represent, as acurately as possible and only in one excerpt, as many different types of speakers that can be found in documentary films.
♦ They had to be as similar as possible to be comparable. Hence, they had to have the same amount of lines of dialogue for the narrator and the experts. They also had to be about the same length both in terms of number of words and minutes.
♦ They had to be able to be understood on their own. Therefore, they were edited to tell self-‐contained stories.
Finally, a part of a documentary film that fulfilled the requirements was found:
Must Watch: a Lioness Adopts a Baby Antelope
<https://www.youtube.com/watch?v=m2w-‐1BfHFKM>, a 7-‐minute excerpt of the episode Odd Couples from the series Unlikely Animal Friends broadcasted in 2009 by National Geographic. From that excerpt, two smaller excerpts with the following characteristics were selected:
1st Excerpt 2nd Excerpt
Duration 101 seconds (1:41 minutes) 112 seconds (1:52 minutes)
Number of words 283 287
Lines of Dialogue
Narrator 5 4
Female
expert 3 3
Male expert 1 1
Inserts 0 1
Table 2. Excerpts' characteristics.
44 Chapter 1. Introduction
Both excerpts were machine translated from English into Spanish using Google Translate MT Engine, as it was the best MT engine according to the analysis of the Bil-‐DOC corpus analysis.
(5) Experiment on PE effort
This experiment intended to compare the effort of post-‐editing with the effort of translating in order to determine whether the PE effort is lower than the translation effort. The methodology followed in this experiment is described in the second article contained in this PhD (see chapter 3).
(6) Experiment on PE quality
This experiment intended to compare the quality of post-‐edited MT output with the quality of translations in order to prove whether the quality of the post-‐edited texts is equal to the quality of the translations according to evaluations made by experts of the field. The methodology used in this experiment is described in the third article of this PhD (see chapter 4).
(7) Stimuli creation for the reception study
The best translations and PEs, according to the QA carried out in the previous experiment, were recorded by voice talents at the Escola Catalana de Doblatge in order to be used for the reception study. Furthermore, observational notes were taken by the researcher and analysed afterwards as another way of evaluating the excerpts. The methodological approach is described in detail in the fourth article contained in this PhD (see chapter 5).
(8)Experiment on user reception
This experiment also aimed to compare the quality of post-‐edited MT output with the quality of translations. However, the evaluation was not only made by experts of the field, but also by the dubbing studio where the excerpts were produced, and the end-‐users. The methodology used in this experiment is described in the fourth article of this PhD (see chapter 5).
Implementing Machine Translation and Post-Editing to the Translation of Wildlife
Documentaries through Voice-Over and Off-Screen Dubbing 45
In the following table, the correlation between the steps followed for the methodology and the article for which they were used are summarized:
Art. 1 Art. 2 Art. 3 Art. 4 Methodology: steps
√ Bibliographical review
√ Corpora creation
√ Corpora analysis
√ √ √ Stimuli creation
√ Experiment on PE effort
√ √ Experiment on PE quality
√ Stimuli creation for the reception study
√ Experiment on user reception Table 3. Methodology: steps