Methodology - Syntactic difficulties in translation

2.3.1 Data Set

The data we use in this pilot study was collected in the ROBOT² project (Daems, 2016) and consists of process data in the form of keystroke and eye-tracking data, and product data in the form of the final annotated transla-tions. Eight English source texts of seven to ten sentences each were translated to Dutch by 23 translators who were native speakers of Dutch and had En-glish as (one of) their working language(s). The translators consisted of two well-defined groups: one group of 13 professional translators with minimally five years of translation experience (with one exception, who had been work-ing for two years), and the other of 10 students of a Master in Translation programme at Ghent University. Every translator translated four randomly selected texts. Leaving corrupt or unusable data aside, the ROBOT data set that is used here consists of detailed process and product data of 690 segment translations (314 by students, 376 by professionals).

The ROBOT process data was recorded using CASMACAT (Alabau et al., 2013). This tool can track a user’s mouse and keyboard activity in a controlled translation environment and can be extended with an eye tracker to also monitor a user’s gaze. The researchers of the ROBOT project used anEyeLink 1000 eye tracker. CASMACAT’s output data is compatible with the CRITT Translation Process Research Database (TPR-DB; Carl et al., 2016). By means of freely available Perl scripts³ CASMACAT’s data could be converted into workable and analysable spreadsheets. These spreadsheets show the aggregated values of a magnitude of features for each translated segment. In this paper we will make use of these spreadsheets and some of the features, as we will discuss in the next section.

2.3.2 Features

As mentioned in section 2.2, literature suggests that process data such as duration, number of character insertions and deletions, and gaze information can mark translation difficulty. In addition, product data such as the number

2The ROBOT project compared post-editing (PE) and human translation (HT) by stu-dents (stud.) as well as professional translators (prof.). To this end, eye tracking and keystroke logging was used for data collection, but the author also worked with question-naires to gauge participants’ attitudes towards PE and HT. With respect to comparing PE and HT, research topics included (but were not limited to) task speed, task effort, product quality of tasks, and common error types of tasks. In all research questions, the differences or similarities between stud. and prof. is discussed as well. Some of the author’s key find-ings are: PE is faster than HT but their output quality is comparable, PE is cognitively less demanding than HT, stud. behave differently than prof. with regard to processing texts, and the overall translation quality of stud. and prof. is comparable. The project page can be found athttps://research.flw.ugent.be/en/projects/robot.

3See https://sites.google.com/site/centretranslationinnovation/tpr-dbfor guides and tools concerning TPR-DB.

of errors a translator makes, the number of translation choices a translator can choose from (entropy), and the amount of syntactic (non-)equivalence are plausible indicators of translation difficulty. We will calculate correlations between these process (section2.3.2.1) and product features (section2.3.2.2).

In the following sections, feature names are set in monospace. They are analogous with those used in Translation Process Research Database (Carl et al.,2016) with the exception of AvgPauseRatio,PausedurandEC_TOT, which were added manually by the researchers of the ROBOT project.

2.3.2.1 Process features

The process data includes, but is not limited to, duration and pause infor-mation, textual segment statistics such as length (in tokens or characters), and keystroke and gaze information. In this paper we are only interested in a few that may point to translation difficulty, as found in related research.

As Table 2.1 shows, our experiment includes a number of features that can be categorised into three groups, specifically DURATION, REVISION, and GAZE.

For DURATION, we use the features AvgPauseRatio (added manually during the ROBOT project and already discussed in Section2.2.3.3), and the total production timePdur that measures keyboard activity excluding pauses

>1s. These pauses are summed up inPausedur(also added manually during the ROBOT project), which reflects the time that a translator did not use the keyboard. The threshold is motivated by work byCarl and Kay(2011, p. 969), who claim that one second or longer is the optimal duration to separate PUs (production units), which are segments in time where the target text is being produced. Therefore, the sum of all pauses longer than or equal to 1s is the meaningful, production-less keyboard pause.

REVISION categorises all features that have to do with keyboard input.

Mdel and Mins respectively indicate how many characters have been deleted and inserted into the target window. Neditis a broader concept, in the sense that it keeps track of how many times a translator has gone back to a transla-tion and edited the translatransla-tion. Scatter, finally, counts how often two consec-utively typed characters do not belong to the same word or consecutive words.

Put differently, how frequently a translator moves their cursor to a different word (earlier or later in the text) to make changes to that word.

Lastly, we draw onFixSandFixTto indicate the number of fixations that a translator has had on the source and target text respectively. These two features constitute GAZE.

CategoryFeaturenameindatasetDescription DurationAvgPauseRatio(APR)theaveragedurationperpausedividedbytheaverageproductiontimeper word Pdurdurationofcoherentkeyboardactivityexcludingkeystrokepauses>1s Pausedursumofallpauses>1s Revision

Mdelnumberofcharactersdeletedfromthetranslationwindow Minsnumberofcharactersinsertedintothetranslationwindow Neditnumberoftimesthesegmenthasbeenedited Scatteramountofnon-lineartextproduction(i.e.whentwoconsecutivelytyped charactersdonotbelongtothesamewordorconsecutivewords) GazeFixSfixationsonthesourcetext FixTfixationsonthetargettext Table2.1.Processfeatures

2.3.2.2 Product features

The produced translations in the data set were manually annotated according to an extensive error typology (Daems et al., 2013).⁴ In the current study, we are only interested in the total number of errors (EC_TOT), though. In addition, we use two product features, word translation entropy and syntactic equivalence, that were created by the TPR-DB scripts during the ROBOT project prior to the current study. An overview of these three product features can be found in Table2.2.

Feature

Feat. name

in data set Description

Error count EC_TOT total number of errors made in a segment

Entropy HTra word translation entropy

Syntactic equivalence CrossS Crossvalue for source tokens Table 2.2. Product features

Translation difficulty can take place on different structural planes of lan-guage, ranging from phonology (e.g. homophones) and morphology (e.g. irreg-ular verb inflexion) up to the textual level (e.g. coindexing ambiguity). In this study, we include select product features from the lexical as well as the syntac-tic level. These features are word translation entropy (HTrain TPR-DB) and syntactic equivalence between source and target text (CrossS) as they were touched upon in section2.2.

In the context of TPR-DB, word translation entropy is calculated as shown in Equation2.1(Carl et al.,2016, p. 31). Entropy is concerned with the impact of new information on the current knowledge.

H(s) =

i=1

p(s→ti)∗I(p(s→ti)) (2.1) In this equation,p(s→t_i) stands for the word translation probabilities of a source token s and all its possible translations t_i...n. They are computed as how often a source token has been translated to the specified target token (Eq.2.2).

p(s→ti) = count(s→ti)

#translations (2.2)

4This typology is divided into two main categories, namely adequacy errors and accept-ability errors. Adequacy entails issues such as contradiction, word sense disambiguation, hyponymy and hyperonymy, deletion, addition and so on. Acceptability itself is divided into five sub-classes namely Grammar & Syntax, Lexicon, Spelling & Typos, Style & Register, and Coherence. These classes each contain even more fine-grained errors.

The informationI that is present in a distribution with equal probability of an eventpcan be formulated as in Eq.2.3. It is the smallest number of bits necessary to encode the probabilityp.

I(p) =−log2(p) (2.3)

The word translation entropyH(s) of a source tokens, then, can be phrased as “the sum over all observed word translation probabilities (i.e. expectations) of a given ST word sinto TT words ti...n multiplied with their information content” (Carl et al.,2016, p. 31).

To apply the metric, all the translations of the segment concerned are put together to approximate the number of options a translator can choose from. As an example, if a source token is translated exactly the same by all translators then its entropy is H(s) = 0: there is only one option to choose from, so choice – in itself – is non-existent.

In this study, the calculation of word translation entropy is based on the final translations but in future research we intend to do away with the need of product data. In the case of word translation entropy, we plan to calculate it with information from large parallel corpora.

In contrast with word translation entropy, which operates on the lexico-se-mantic level, syntactic equivalence is a syntactic feature. In TPR-DB’s gen-erated features, there are two particularly interesting syntactic equivalence features that map the amount of word re-ordering that has to take place to transform the source text to the target text or vice versa. These features are calledCrossSandCrossTrespectively. In our study, we are only interested in going from the source text to the translationCrossS.

Figure2.2visualises such a re-ordering procedure. The higher the value for CrossS, the more re-ordering steps have to take place to generate the target text. The more syntactic transformations a translation requires, the higher the difficulty of that translation task.

Figure 2.2. An illustration of syntactic re-ordering from English to Dutch In the following section, a couple of methodological notes on the used corre-lation metrics are highlighted. They are necessary to provide a comprehensive overview of the results later on.

2.3.3 Correlation Metrics

In early tests it became clear that our data is not linearly distributed (e.g.

Figure2.3) and outliers are frequent. Therefore, we opted to use Kendall’s tau

τ as correlation metric. When calculating correlations, all features have been normalised by the number of source tokens in the segment, hence the prefix Normin the labels in Figure2.3. For conciseness’ sake, we do not prependNorm to the feature names in the text, but it is important to keep in mind that they have been normalised by the number of source tokens.

Figure 2.3. Scatter plot showing a non-linear distribution of data points over entropy (HTra, x-axis) and fixation on the target text (FixT, y-axis). Data set restricted to professional translators

For the feature EC_TOT that designates the number of errors that were annotated in a segment, we only look at a subset of the data (242 data points), namely only those segments where the error count is larger than zero. In other words, we are only interested in the final translations of segments that contain errors. When a segment has errors, it can be assumed that it was difficult to translate but when a segment has been translated without errors this does not imply that it was easy to translate.

Because we are interested in the difference between professional translators and students, both data sets have been analysed separately. By doing so, differences between professionals and students (if any) are emphasised.

Dans le document Syntactic difficulties in translation (Page 67-72)