• Aucun résultat trouvé

Google Assistant’s Interpreter Mode: is it really an interpreter?

N/A
N/A
Protected

Academic year: 2022

Partager "Google Assistant’s Interpreter Mode: is it really an interpreter?"

Copied!
119
0
0

Texte intégral

(1)

Master

Reference

Google Assistant's Interpreter Mode: is it really an interpreter?

ZOUAOUI, Safa

Abstract

This study investigates whether Google's Interpreter Mode can replace human interpreters within the context of a journalistic interview. In the first part, we explored the theoretical aspects of our work by discussing speech-to-speech machine translation and its components as well as its place within the world of journalism. We then presented Voice Assistant and Google's Interpreter Mode. The second part of this work discusses the experiment that we conducted by simulating three journalistic interviews which allowed us to evaluate Interpreter Mode. Interview participants evaluated the usability of the system as well as the fluency and adequacy of the translations it produced. We, on the other hand, looked closer at the number of successful interactions between the system and the users, as well as other metrics. Our result-analysis showed that Interpreter Mode is not capable of replacing human interpreters in the context of a journalistic interview.

ZOUAOUI, Safa. Google Assistant's Interpreter Mode: is it really an interpreter?. Master : Univ. Genève, 2021

Available at:

http://archive-ouverte.unige.ch/unige:151334

Disclaimer: layout of this document may differ from the published version.

1 / 1

(2)

Google Assistant’s Interpreter Mode: is it really an interpreter?

Google Assistant’s Interpreter Mode:

Is it really an interpreter?

by Safa Zouaoui

A Master’s Thesis

Presented to the Faculty of Translation and Interpreting In Partial Fulfillment of Master of Arts

In Translation and Technology

Under the Supervision of Professor Pierrette BOUILLON Jury: Dr. Johanna GERLACH

University of Geneva January 2021

(3)

2 Acknowledgements

To my advisor Professor Pierrette Bouillon for her patience, help, guidance and, expertise without which I would have never been able to complete this work;

To Doctor Johanna Gerlach, for sitting on my jury;

To Professor Sonia Halimi for her endless support and for always believing in me throughout my time here in the Faculty of Translation and Interpreting;

To my family and friends who have always been by my side, supporting and believing in me;

To my mom, for her love and care packages;

To my dad for being my mentor;

And to R, for everything.

Thank you.

(4)

3

Table of Contents

1. Introduction ... 1

1.1. Background and motivation ... 1

1.2. Research question and aim ... 2

1.3. Thesis plan ... 5

2. State of the art ... 5

2.1. Introduction ... 5

2.2. Speech-to-speech Machine Translation (MT) ... 5

Automatic voice and speech recognition (ASR) ... 8

Machine Translation ... 11

2.2.2.1. Brief History... 11

2.2.2.2. Approaches to Machine Translation ... 12

Rule-Based Machine Translation (RBMT) Systems ... 12

Statistical Machine Translation (SMT) Systems ... 14

Neural Machine Translation (NMT) Systems ... 15

Text-to-Speech Synthesis (TTSS) ... 16

2.3. Machine Translation in journalism ... 17

2.4. Google Assistant’s Interpreter Mode ... 18

Voice Assistants ... 19

2.4.1.1. Origins and evolution ... 19

2.4.1.2. The Google Assistant and Interpreter Mode ... 23

2.4.1.3. Previous Evaluations of Google Assistant’s Interpreter Mode ... 27

2.5. Conclusion ... 28

(5)

4

3. The experiment ... 29

3.1. Objective and research question ... 29

3.2. Experiment protocol ... 30

Steps ... 30

Questions ... 32

Participants ... 34

Tools ... 37

3.3. Evaluation methods ... 40

Fluency and Adequacy ... 40

System usability... 42

Other metrics ... 46

3.4. Preliminary testing... 47

Objectives and process ... 47

Results and conclusions ... 48

4. Experiment results ... 49

4.1. Introduction ... 49

4.2. In-depth look into each interview ... 49

a. Interview 1 (P1) ... 49

b. Interview 2 (P2) ... 51

c. Interview 3 (P3) ... 54

4.3. System Usability... 57

4.4. Fluency and Adequacy ... 64

Scores for the questions asked by the interviewer (EN → AR) ... 64

Scores for the answers of participants (AR → EN) ... 67

(6)

5

Comparing the EN → AR scores and the AR → EN scores... 69

4.4.2.1. Average fluency scores ... 69

4.4.2.2. Average Adequacy scores ... 70

4.5. Other metrics ... 71

Number of correct interactions ... 71

4.5.1.1. Success rate summaries ... 79

Use of correct gender when addressing participants ... 83

Translating Group 2 questions ... 83

Translation of the term “Coronavirus Pandemic” from English into Arabic 83 4.6. Conclusion ... 85

5. Conclusion ... 87

6. Bibliography ... 89

7. Annexes ... 92

(7)

6 List of figures:

Figure 1: A graph showing the evolution of worldwide searches of the term “Coronavirus”

in comparison with topics like Music, News and Weather over the course of 12 months

(Coronavirus Search Trends, 2020.). ... 4

Figure 2: Process flow of Speech-to-speech translation. (Kitano, 1994, p. 2)... 7

Figure 3: A depiction of how ASR systems work... 10

Figure 4: Overview of the translation task (Fig. 1 Overview of Translation Task 1 Catford (1965, 1965, p. 20)... 11

Figure 5: The Vauquois Triangle that explains how rule-based Machine Translation (MT) systems work ... 13

Figure 6: A representation of the process of converting text into speech (“Speech Synthesis,” 2020) ... 17

Figure 7: Demonstration of Shoebox by Dr. E. A. Quade, manager of the advanced technology group in IBM's Advanced Systems Development Laboratory. ... 19

Figure 8: Voice Assistants released from 1961 to 2016. ... 20

Figure 9: Voice Assistants released from January 2019 to July 2019. ... 21

Figure 10: summary of how Voice Assistants work when triggered. ... 22

Figure 11: number of Voice Assistant units used around the world. ... 23

Figure 12: Google Translate's landing page ... 24

Figure 13 Samsung Note 10+ ... 38

Figure 14 iPhone 11 Pro ... 38

Figure 15: Sony WH-1000XM3 Wireless Headphones ... 40

Figure 16: Adequacy Evaluation Scale ... 41

Figure 17: Fluency Evaluation Scale ... 41

Figure 18: example of a scored system usability scale (SUS) (Jordan et al., 1996, fig. 21.2) ... 44

Figure 19: meaning of different SUS scores ... 45

Figure 20: screenshot showing how IM translated P3's answer to Q2 ... 55

Figure 21: screenshot showing the SUS item 5 results ... 60

(8)

7

(9)

8 List of tables:

Table 1: summary of interview questions as well as their description and/or aim ... 33

Table 2: number and spoken languages of participants ... 34

Table 3: summary of participants' self-proclaimed MSA levels... 35

Table 4: Operating Systems intended to use with each participant ... 37

Table 5: updated Table 4 ... 37

Table 6: final SUS scores given by each participant ... 57

Table 7: Answers chosen by participants for item 1 of the SUS ... 58

Table 8: Answers chosen by participants for item 2 of the SUS ... 58

Table 9: Answers chosen by participants for item 3 of the SUS ... 59

Table 10: Answers chosen by participants for item 4 of the SUS ... 59

Table 11: Answers chosen by participants for Entry 5 of the SUS ... 60

Table 12: Answers chosen by participants for item 6 of the SUS ... 61

Table 13: Answers chosen by participants for item 7 of the SUS ... 62

Table 14: Answers chosen by participants for item 8 of the SUS ... 62

Table 15: Answers chosen by participants for item 9 of the SUS ... 62

Table 16: Answers chosen by participants for item 10 of the SUS ... 63

Table 17: Fluency scores for the translations of the questions asked (EN → AR) ... 65

Table 18: Adequacy scores for the translations of the questions asked (EN → AR) ... 66

Table 19: fluency scores for the translations of the answers of participants (AR → EN) 67 Table 20: Adequacy scores for the translations of the questions asked (AR → EN) ... 68

Table 21: Average Fluency scores per language pair ... 69

Table 22: Average Adequacy scores per language pair ... 70

Table 23: table of repeated questions ... 72

Table 24: group 1 question results ... 73

Table 25: group 2 question results and scores ... 74

Table 26: group 3 question results ... 76

Table 27: group 4 question results ... 77

Table 28: group 5 question results ... 78

(10)

9

Table 29: P1 interview success rate ... 79

Table 30: P2 interview success rate ... 80

Table 31: P3 interview success rate ... 81

Table 32: translation success rate summary ... 82

Table 33: consistency in translating the term "Coronavirus Pandemic" during the 3 interviews ... 84

(11)

Google Assistant’s Interpreter Mode: is it really an interpreter?

1

1. Introduction

1.1. Background and motivation

The idea of a device that can translate any form of speech into any language and in real time, has always been a part of the science fiction realms. Back when I was in high school, I read The Hitchhiker’s Guide to the Galaxy and I vividly remember my fascination with the Babel Fish.

In the words of Douglas Adams, the Babel Fish was “probably the oddest thing in the Universe (D. Adams, 2017).” This “oddest thing in the universe” is a small, bright yellow fish that can be placed into someone's ear to allow them to hear any spoken language being instantly translated into their own first language. In the very beginning of the story, Ford Prefect places a Babel Fish in Arthur Dent's ear so that he can understand the Vogon speech (D. Adams, 2017).

Douglas Adams was not the only science fiction writer to flirt with the idea of the ultimate translation device. In Star Wars, C-3PO1 was characterized by its knowledge of more than seven million forms of communication and it was used to translate speech throughout the different galaxies and planets. Of course, we cannot talk about sci-fi translation devices without bringing up how the creators of the hit television show, Star Trek, visualized the concept and gave us the Universal Translator. According to the story, the Universal Translator (also referred to as UT or Translator Circuit), was a handheld device that featured a keypad and a display. The Universal Translator’s design and capabilities are very close to today’s reality and it was simple to use; one person speaks their language into it until the UT manages to gather enough data to build a translation matrix (Lorelei, 2019). Adams explains that due to its experimental nature, the UT underwent multiple improvement attempts, such as the creation of the linguacode translation matrix that helps speed up the translation of new and unknown languages.

Back in 2016, a friend of mine, who was a freelance journalist, called me one day and asked for my help with something work-related; she wanted me to be the interpreter between her, a native American English speaker, and a Sudanese refugee who was a native Arabic speaker and who did

1 Star Wars, last acceded January 16, 2021 - https://www.starwars.com/databank/c-3po

(12)

2 not speak English at all. My mission that day was to interpret what was being said in both directions. My friend is a freelance journalist, so hiring a human interpreter was not an option for her due to the high hourly costs. She was lucky she had someone in her contacts who spoke both English and Arabic. I may have not done the job as well as a paid professional would, but I believe that I managed to convey meaning in both directions. This incident got me thinking: what if my friend did not have me in her contacts list? What if she could not interview the Sudanese lady because of the language barrier? Well, my friend would have probably missed out on her chance to write an inspiring article and the Sudanese lady’s story would have never been told.

This story stuck with me, it really proves how important the acts of translating and interpreting are, and how important translators and interpreters are to bridge the gaps between people who do not speak the same language. But securing a human interpreter to help us with something is not always an easy and accessible option. That is why, when I learned that Google has released a new feature for the Google Assistant called “Interpreter Mode”, it got me thinking that we have come much closer to achieving what science fiction writers have imagined only a few decades ago; a device that is capable of functioning like a Babel Fish or a Universal Translator. As technology keeps getting better every day, the idea of a portable device that can make multilingual human communication on the fly not only feasible but also better and faster, is getting closer every day to becoming a tangible reality. Of course, Interpreter Mode made me think again about my friend; she now potentially has a tool that can help her interview Arabic speakers without my help. At the same time, I also could not help but wonder, is this tool any good and to which extent can it replace human interpreters?

1.2. Research question and aim

With this study, we aim to determine whether Google Assistant’s Interpreter Mode (hereafter IM) can replace a human Interpreter in the context of a journalistic interview.

To answer this bigger research question, we need to answer the following secondary research questions:

(13)

3 1. Do IM’s intended users find it efficient and easy to use?

2. Does IM produce fluent and adequate translations?

3. Is IM’s translation quality affected by the type/complexity of the source text?

4. Is there a difference in performance depending on the source language?

5. Can IM keep track of the context during a conversation like a human interpreter?

6. Does the user’s Modern Standard Arabic (MSA) proficiency level affect IM’s performance?

To provide an answer to the questions stated above, we will realize an experimental study to test Interpreter Mode’s ability to produce translations in both directions: from English into Arabic and vice versa. Our experiment will mimic real-life conditions of a journalistic interview setting. We will first prepare a mock interview about the Coronavirus Pandemic, which is the current global hot topic and something that has been searched on Google more than anything else this year.

According to Google Trends2 , during the height of the pandemic between March 2020 and April 2020, Coronavirus-related searches were considerably higher than searches of topics like music, news, and weather as can be seen in Figure 1.

2

https://trends.google.com/trends/explore?q=%2Fm%2F01cpyy,%2Fm%2F04rlf,%2Fm%2F05jhg,%2Fm%2F0866r - website viewed September 10th, 2020

(14)

4

Figure 1: A graph showing the evolution of worldwide searches of the term “Coronavirus” in comparison with topics like Music, News and Weather over the course of 12 months (Coronavirus Search Trends, 2020.).

Google Assistant, and consequently Interpreter Mode, relies on Google’s Artificial Intelligence (AI) engines to function. Those AI engines use Deep Learning (DL) technologies to train and improve themselves. Thus, we can hypothesize that the notable increase in the number of searches related to the Coronavirus Pandemic means that there is more data available for Google’s AI to use to get better at translating Coronavirus-related speech.

The same mock interview will be carried out with every participant in this study. This means that the exact same questions will be asked every time, but different answers are expected from each participant.

Interpreter Mode will be faced with different predefined difficulties and the produced translations’ fluency and adequacy will be evaluated. A complementary assessment will be carried out using the System Usability Scale (SUS) developed by John Brooke in Chapter 21 of his book “Usability Evaluation In Industry” (Jordan et al., 1996). Eventually, we will use other metrics to measure and assess Interpreter Mode’s performance.

(15)

5

1.3. Thesis plan

The present work is composed of five chapters. We will first explore the different theoretical concepts that are relevant to this study (2). In 2, we will look at how speech-to-speech MT works by explaining its three components: automatic voice and speech recognition (2.2.1), machine translation (MT) (2.2.2) and text-to-speech synthesis (TTSS) (2.2.3). In the same chapter, we will then briefly look at the role of MT in journalism (2.3) before moving to an overview of Interpreter Mode, the tool that we are testing (2.4). 3 will be dedicated to the experiment that we carried out to answer our research questions. In 3, we looked at the experiment protocol (3.2), our evaluation methods (3.3) and did a preliminary test (3.4). In 4, we will present the results of our experiment before drawing the appropriate conclusions in chapter 5.

2. State of the art

2.1. Introduction

In this chapter, we will explore the different theoretical aspects of this work. First, we will look at Speech-to-speech Machine Translation (MT) (2.2) and dive into its different components:

automatic voice and speech recognition (2.2.1), Machine Translation (2.2.2) and, finally text-to- speech synthesis (2.2.3). After that, we will look at Machine Translation in journalism (2.3). The last section (2.4) will focus on Voice Assistants and Google’s Interpreter Mode.

2.2. Speech-to-speech Machine Translation (MT)

Speech-to-speech Machine Translation (MT) systems have come a long way since the early days of “SpeechTrans”, an experimental speech-to-speech MT system developed in 1989 (Zong & Seligman, 2005). What seemed like ambitious goals only a few decades ago, is now a rapidly evolving tangible reality. In fact, only three decades ago, Hiroaki Kitano wrote that:

(16)

6

“Development of a speech-to-speech translation system or interpreting telephony is one of the ultimate goals of research in natural language, speech recognition, and artificial intelligence. The task of speech-

to-speech translation ultimately requires recognition and understanding of speaker-independent, large vocabulary, continuous speech in the context of mixed-initiative dialogues. It also needs to accurately translate and produce

appropriately articulated audio output in real-time (Kitano, 1994, p. 1).”

Kitano continues by emphasizing the utility and importance of speech-to-speech MT. He argues that “beside obvious scientific and engineering significance, there are unspeakable economic and cultural impacts” to this technology. He then points out that any advancement in the field of speech-to-speech MT would require “a collective effort of various researchers.”. This is because a speech-to-speech MT system involves different technologies as explained in Figure 2 below:

(17)

7

Figure 2: Process flow of Speech-to-speech translation. (Kitano, 1994, p. 2)

In today’s systems, the process flow of speech-to-speech MT (Figure 2) remains the same and still has three main components: Speech Recognition, MT and, Text-to-speech Synthesis (TTSS).

Those components must necessarily function together for the system to work properly, because a sentence that is not well recognized by the Speech Recognition component, will most likely be mistranslated by the MT component. Within the process flow, MT remains the core component of speech-to-speech MT systems and thus, any shortcomings that may affect the MT component,

(18)

8 would have direct repercussions on the overall speech-to-speech MT performance. According to Zong & Seligman (2005, p.114), despite all the strides that MT has achieved over the past decades, it still suffers from issues in performance:

“Even neglecting issues of speech input and output, most researchers in MT [Machine Translation] have already indefinitely postponed the goal of

fully automatic high-quality translation for any domain or text type […].

Clearly, if such effortless automatic operation cannot presently be achieved in text-only translation, its achievement in speech-to-speech translation is even

less likely.”

In the following section, we will look at how speech-to-speech MT works and explore its three components: Automatic Voice and Speech Recognition (ASR), MT and Text-to-speech Synthesis (TTSS).

Automatic voice and speech recognition (ASR)

Although often used interchangeably, Speech Recognition and Voice Recognition are two distinct technologies.

The purpose of voice recognition is to identify the person who is speaking, and the system achieves this by mapping the subtle differences in speech between individuals. Unlike speech recognition, voice recognition does not generate an output, it simply confirms the speaker’s identity. In this case, a person’s voice can be used as a security feature on smartphones for example – Google’s Assistant would not reply to the command “OK Google” unless the owner of the smartphone utters it, for instance (Kikel, 2017).

Speech Recognition Software (or technology) is:

(19)

9

“the technology by which sounds, words or phrases spoken by humans are converted into electrical signals, and these signals are transformed into

coding patterns to which meaning has been assigned” (R. E. Adams, 1990).

Today, ASR enables electronic gadgets such as phones, computers, and other devices to receive, recognize and understand human utterances. It uses natural language as input and it is being used to replace older methods of input like typing, texting, and clicking.

Jurafsky & Martin (Jurafsky & Martin, 2009, p. 327) state the goal of ASR research as addressing the problem of understanding spoken language by using computers to build systems that are able to get a string of words from an acoustic signal.

The history of ASR systems goes back to the 1950s. Back then, the systems were mainly focused on recognizing numbers. In 1952, Bell Laboratories unveiled the Audrey system which was capable of recognizing digits loudly spoken by a single person. After Audrey we witnessed many developments of Speech Recognition Software (SRS), and in the nineties, SRSs like Dragon Dictate became widely used. By the beginning of the new millennium, ASR technology had reached an accuracy level of 80%. That was when Google launched the Google Voice Search Engine (GVSE).

This App-based service made ASR an accessible tool, available to billions of smartphone users.

The data collected from all those users over the years has been crucial in improving the GVSE.

Later years marked the revolution of speech recognition devices with tech giants such as Apple and Amazon launching their own ASR engines and devices.3

Within the process of speech-to-speech MT, ASR serves as an alternative to manual input of the text to be translated.

ASR works as shown in Figure 3 below:

3 A Brief History of Speech Recognition, accessed 07 January 7, 2021, https://sonix.ai/history-of- speech-recognition

(20)

10

Figure 3: A depiction of how ASR systems work4

Yu & Deng (2015a, p. 4) describe what happens inside the system when the audio signal is detected. First, the system analyzes the utterance and extracts distinctive features from it. This feature extraction consists in digitizing the utterance into a signal by converting the spoken words into a digital representation after noise removal. The audio signal goes through the feature extraction component that takes the signal and enhances it by removing noises and channel distortions. The digital representation is then sent to a decoder that uses acoustic models and language models to decipher it. The acoustic model uses an acoustic dictionary to transform the signal into a sequence of basic units (phonemes). The language model analyzes context to ensure that the generated sentence is syntactically and semantically valid. For example, the language model would ensure that homonyms (such as higher and hire) are correctly used and spelled.

4 Figure extracted from https://www.rfwireless-world.com/Terminology/automatic-speech-recognition- system (Automatic Speech Recognition System, 2012)

(21)

11 Finally, a text output is generated (Yu & Deng, 2015b, Chapter 1.2). In the context of speech-to- speech Machine Translation, the output would go now through the Machine Translation step.

Machine Translation

2.2.2.1. Brief History

According to the online version of the Cambridge Dictionary, MT is “the process of changing text from one language into another language using a computer”5. For a translation to be considered successful, the meaning of the source text (ST) must be transferred to the target text (TT) in full and with accuracy. The process of going from ST to TT can be summarized in Figure 4 below:

Figure 4: Overview of the translation task (Fig. 1 Overview of Translation Task 1 Catford (1965, 1965, p. 20)

Behind this rather simple representation, lies a challenging cognitive endeavor. For humans who are professionally trained to carry out translations, this cognitive process might seem straightforward, although it requires extensive knowledge of both natural languages’ grammar, syntax, and semantics. Moreover, translation often transcends language itself into the realms of culture, customs, and undertones. These are some of the challenges imposed by natural languages. And just like human translators, machine translation must deal with them as well.

5 Cambridge English Dictionary, definition retrieved January 7, 2021, https://dictionary.cambridge.org/dictionary/english/machine-translation

(22)

12 Today, we have many MT systems that are more reliable than ever before, but this reliability is far from being absolute. Humans are still a necessary link in the translation process chain, and this is why we have processes like pre-edition and post-edition, where humans intervene to either facilitate the machine’s work or improve it, respectively. We should note that this is only applicable to text-to-text translations and cannot be done in the context of speech-to-speech translations.

Machine Translation systems can be categorized into three types, which will be discussed in detail in the next section of this study.

2.2.2.2. Approaches to Machine Translation

Rule-Based Machine Translation (RBMT) Systems

Rule-Based Machine Translation (RBMT) systems are amongst the earliest commercialized Machine Translation (MT) systems to exist (Poibeau, 2017, p. 35). RBMT systems rely on syntactic, linguistic, and semantic rules to produce a target text TT from a given source text ST. How RBMT systems function is very straightforward; the user manually builds bilingual dictionaries for each language pair, then rules are mapped and written by experts to govern the transformation of texts from ST to TT. When the ST presents no ambiguities and can be translated literally, the system is usually able to provide satisfactory translations. Otherwise, the system needs to be manually trained to tackle different difficulties and ambiguities such as homonyms.

Therefore, the key element of having and maintaining a well-functioning RBMT system is constant refinement which can turn into a time-consuming and laborious task.

RBMT systems function in different ways. There are Direct systems that translate sentences in the most basic way: word for word. According to Poibeau these Direct Systems “are generally dictionary-based: a dictionary provides a word-for-word translation, and then more or less sophisticated rules try to reorder the target words so as to get a word order as close as possible to what is required by the target language” (Poibeau, 2017, p. 23). Poibeau (2017, p. 24) continues to define a second type of RBMT systems that are called Transfer systems. These systems are a bit more complex as they involve syntactic analysis. The involvement of syntactic analysis allows the system to bypass the word-for-word approach, and to produce more idiomatic

(23)

13 results from a semantic standpoint and that’s “as long as the syntactic component provides accurate information on the source and on the target language” (Poibeau, 2017). The third type of RBTM systems according to Poibeau (2017, p. 26) is considered to be the most sophisticated and ambitious of the three. This is a system based on an interlingua. The idea behind this approach is to transform the ST into an interlingua or an abstract-independent representation which is then used to generate the TT.

Despite having been the standard systems for a long time, RBMT systems have their shortcoming.

Refining and updating a RBMT system can be very time-consuming and ambiguity remains RBMT systems’ arch nemesis.

These various approaches to RBMT systems can be summarized using the Vauquois Triangle seen in Figure 5 below.

Figure 5: The Vauquois Triangle that explains how rule-based Machine Translation (MT) systems work

(24)

14 Statistical Machine Translation (SMT) Systems

Statistical Machine Translation (SMT) remained the MT de-facto standard for a long time.

As previously discussed, Rule-Based Machine Translation systems provided good results but only when the systems were well maintained, trained, and specialized. SMT systems, on the other hand, provide comparably improved results from RBMT systems without requiring the same amount of maintenance and upkeep. SMT systems are data-driven and they rely on corpora and probabilities to produce translations. Unlike RBMT systems, which focus on the process of turning a source text into a target text, SMT systems focus on the result or the final output. This idea was discussed by Jurafsky and Martin (2009, p.910) in their book Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition:

“The three classic architectures for MT (direct, transfer, and interlingua) all provide answers to the questions of what representations to

use and what steps to perform to translate. But there is another way to approach the problem of translation: to focus on the result, not the process.”

(Jurafsky & Martin, 2009)

Jurafsky & Martin argue that SMT systems reconcile the core concepts of translation: faithfulness and fluency “by building probabilistic models of faithfulness and fluency, and then combining these models to choose the most probable translation.”. The models in question are namely the language model and the translation model. The language model can be compared to a word- prediction tool. It relies on monolingual target language corpora to statistically determine whether a given translation is fluent, regardless of its faithfulness. This is done when the probabilistic model, also called an N-gram model, calculates the likelihood of lexical units (N- grams) occurring one after the other. When the likelihood (or probability) increases, chances are that the translation in question has the highest chance of being correct. The translation model relies on bilingual corpora and complements the language model. Jurafsky & Martin summarize how the translation model works as such:

(25)

15

“The job of the translation model, given an English sentence E and a foreign sentence F, is to assign a probability that E generates F.” (Jurafsky &

Martin, 2009)

This means that the translation model will determine which target sentence is most likely an acceptable translation for a given source sentence. This translation model is said to be phrase- base because it “use[s] phrases (sequences of words) as well as single words as the fundamental units of translation.” (Jurafsky & Martin, 2009).

The dominance and success of SMT systems were established because, unlike RBMT systems, they could generate accurate and good results more consistently and with much less maintenance and fine-tuning. But SMT systems still require solid corpora training in order to give optimal results, otherwise, they are prone to committing syntactic and misinterpretation errors.

According to Philipp Koehn, SMT “has gained tremendous momentum, both in the research community and in the commercial sector” (Koehn, 2010a). Many academic papers have been published on the matter. This happened at the same time as SMT systems were being commercialized. This commercialization led to the creation of free online systems by Google and Microsoft (Koehn, 2010a).

Neural Machine Translation (NMT) Systems

Systran, an industry-leading MT solutions provider, defines Neural Machine Translation (NMT) systems on their website as “self-learning translation machine[s]” that, unlike RBMT and SMT systems, rely on an artificial neural network (ANN) to perform the translation task6.

According to the DeepAI7 website, NMT is an MT approach that predicts the likelihood of a sequence of words by using an artificial neural network (ANN). NMT systems have been ramping up popularity and they are quickly outperforming traditional forms of translation systems.”(Neural Machine Translation, 2019). Cho & al. (2014) say that:

6 Systran, accessed January 7, 2021, https://www.systransoft.com/systran/translation-technology/neural- machine-translation-nmt/

7 DeepAI, accessted January 7, 2021 https://deepai.org/machine-learning-glossary-and-terms/neural- machine-translation

(26)

16

“Neural Machine Translation models often consist of an encoder and a decoder. The encoder extracts a fixed-length representation from a variable- length input sentence, and the decoder generates a correct translation from

this representation” (Cho et al., 2014).

Neural networks’ main strength is their self-learning ability. They are able to tweak and correct their parameters without human intervention by comparing any generated output to an expected reference before corrections are used to tweak and tune the network’s connections and parameters. This process is called Deep Learning, and it enables the system to constantly learn and correct its own mistakes without any human intervention.

Nowadays, Google Translate (GT) is considered one of the best current NMT systems available today. Google launched their GT service in 2006. At first, for almost ten years, the system relied on a phrase-based model to provide translations to millions of users. On September 27th, 2016, Google announced the Google Neural Machine Translation system (GNMT) in a blog post8. According to Google, this system “utilizes state-of-the-art training techniques to achieve the largest improvements to date for MT quality.” (Le & Schuster, 2016). This system’s strength lies in the fact that it uses an end-to-end design that allows it to learn and get better on its own, unlike SMT systems where feeding the system corpora and the quality of the corpora determines how well the system performs. GT will be discussed further in section (2.4.1)

Text-to-Speech Synthesis (TTSS)

As seen in section (2.2), speech (or voice) synthesis is the last step performed by the speech-to-speech translation system. In this step, the system simply transforms a string of lexical units into audible speech or sounds.

8 Google AI Blog, accessed January 7, 2021 https://ai.googleblog.com/2016/09/a-neural-network-for- machine.html

(27)

17 Jurafsky & Martin (2009, p. 250) explain that speech synthesis works analyzing texts, the performing a linguistic analysis that generates an a phonemic internal representation which is then converted into waveform. This can be seen in Figure 6:

Figure 6: A representation of the process of converting text into speech (“Speech Synthesis,” 2020)9

As seen in Figure 6 , this process can be described as reversed ASR where we start from text and end up with audible speech.

2.3. Machine Translation in journalism

Language and journalism are tightly connected fields. This observation is supported by the very definition of the word “journalist” in the online version of the Cambridge Dictionary: “a person who writes news stories or articles for a newspaper or magazine or broadcasts them on radio or television.” 10

Journalists use language as their primary tool to do their job, be it writing articles, reporting news, or conducting interviews to name a few. However, according to Van Doorslaer (2009, pp. 83-92), we do not see a lot of research being done on the role of translation in this process. Van Doorslaer

9 Wikipedia, accessed September 23, 2020, https://en.wikipedia.org/wiki/Speech_synthesis

10 Cambridge Dictionary, definition retrieved September 17, 2020, https://dictionary.cambridge.org/dictionary/english/journalist

(28)

18 argues that the field of Translation Studies, while dominated by issues of audiovisual translation like dubbing and subtitling, still lacks focus on the use of translation in newsrooms, for example.

Speech-to-speech MT has not been used a lot in the field of journalism until a new tool called ALTO was developed by the BBC. According to the BBC News website “ALTO is a virtual voice- over tool for reversioning video content into multiple languages using text-to-speech voice synthesis” 11.

ALTO is a tool that combines many language processing technologies such as computer assisted translation and text-to-speech voice synthesis. The tool uses machine translation to generate a video transcript which can be then post-edited manually. Then, text-to-speech technology is used to convert the translated script into a computer-generated voice track using text-to-speech technology (TTS). This step is carried out by language journalists. ALTO then automatically attaches the newly translated audio to the video file, on top of the natural voice track.

ALTO was first launched in late 2015 and only supported the Japanese language. Since then, ALTO kept improving continuously. Some of those improvements brought to it include a pronunciation dictionary and automated video trimming and control over ALTO’s phonetic performance.

2.4. Google Assistant’s Interpreter Mode

A few decades ago, talking about a computer that fits in your pocket sounded like something straight out of a science fiction movie. Today, we not only have computers that fit in our pockets, but those computers also get smaller, more powerful, and all round better every day. Those tiny computers are called: Smart phones.

Smartphones were the result of an unlikely but successful marriage between Personal Digital Assistants and Cell Phones (Nguyen, 2019).

A Personal Digital Assistant (PDA) is a handheld mobile device that allows users to store and retrieve data such as schedules, calendars, and address books (Conrad et al., 2017). Whereas a Cell Phone in its primitive form is defined as “a phone that is connected to the phone system by

11 BBC, definition retrieved December 25, 2020, https://bbcnewslabs.co.uk/projects/alto/

(29)

19 radio instead of by a wire, and that can be used anywhere where its signals can be received (CELL PHONE | Meaning in the Cambridge English Dictionary, 2020).

Modern day smartphones are packed with features, the revolution came when developers started working on Applications that can be installed seamlessly on smartphones to answer to almost any need a user might have. Like the technology giant Apple eloquently put it back in 2010, whatever thing you need to do on your smartphone, be sure that “there’s an app for that”

(Gross, 2010).

In this section, we will explore the world of voice assistants and focus on the Google Assistant which now offers the Interpreter Mode (IM) feature.

Voice Assistants

2.4.1.1. Origins and evolution

Voice Assistants, such as Amazon’s Alexa, Apple’s Siri and Google’s Assistant are

“software agents that run on purpose-built speaker devices or smartphones. (Hoy, 2018)”. The first Voice Assistant was created by IBM in the early 1960s12. It was called IBM Shoebox (Figure 7) and it paved the way for today’s existing Voice Assistants.

Figure 7: Demonstration of Shoebox by Dr. E. A. Quade, manager of the advanced technology group in IBM's Advanced Systems Development Laboratory. 13

12 IBM, accessed 25 December 25, 2020,

<https://www.ibm.com/ibm/history/exhibits/specialprod1/specialprod1_7.html>

13 IBM, accessed 25 December 25, 2020,

<https://www.ibm.com/ibm/history/exhibits/specialprod1/specialprod1_7.html>

(30)

20 Shoebox was only the beginning, and after it came multiple Voice Assistants. From 1961 until 2016, the number of the available Voice Assistants kept increasing slowly, but steadily as shown in Figure 8. Most of the well-known Voice Assistants, such as Siri, Cortana, Google Assistant and Alexa were released during that timeframe.

Figure 8: Voice Assistants released from 1961 to 2016.14

14 Voicebot.ai, accessed 25 December 2020, <https://voicebot.ai/voice-assistant-history-timeline/>

(31)

21 After 2016, the number of Voice Assistants released on a yearly basis exploded as more companies started including voice commands into their products. The years 2018 and 2019 continued with the same trend, and many more Voice Assistants were released especially in the year 2019 as shown in Figure 9:

Figure 9: Voice Assistants released from January 2019 to July 2019.15

15 Voicebot.ai, accessed 25 December 2020, <https://voicebot.ai/voice-assistant-history-timeline/>

(32)

22 Voice Assistance function by constantly listening for a key word to trigger them. Then, the software uses Speech recognition to understand what the user needs by activating Natural Language Processing (NLP). The Voice Assistant proceeds then to retrieving the desired information by using Application Programming Interfaces (APIs), which are software that allow applications to communicate. Lastly, the system outputs the desired information to the user via a synthetic voice (Pant, 2016, p. 3). The stated steps can be summarized in Figure 10:

Figure 10: summary of how Voice Assistants work when triggered.16

According to Statista17 (Figure 11), there are 4.2 billion units of digital Voice Assistants being used in devices around the world. Forecasts suggest that this number will increase to 8.4 billion units by the year 2024. Devices that have built in Voice Assistants range from cars, kitchen appliances, smartphones and many more.

16 Learning Hub, accessed 25 December, <https://learn.g2.com/voice-assistant>

17 Statista, accessed 25 December 2020, <https://www.statista.com/statistics/973815/worldwide- digital-voice-assistant-in-

use/#:~:text=Number%20of%20digital%20voice%20assistants%20in%20use%20worldwide%202019%2D2024&text

=In%202020%2C%20there%20will%20be,higher%20than%20the%20world's%20population.>

(33)

23

Figure 11: number of Voice Assistant units used around the world.18

Those 4.8 billion voice assistant units are being used for different purposes and in different situations. According to a survey of 500 consumers in the US, Statista determined that 51% of people use smartphone-based Voice Assistants in the car, 39% use them at home, 6% use them in public and only 1% use them at work19.

In this experiment, our focus will be The Google Assistant and its Interpreter Mode function, which we will discuss in the following sections.

2.4.1.2. The Google Assistant and Interpreter Mode

The Google Assistant is Google’s version of Artificial Intelligence (AI) powered voice assistants (What Is Google Assistant?, 2020). Much like Apple’s Siri or Amazon’s Alexa, it is integrated within smart devices, allowing users to interact with it via voice commands. Users can

18 Statista, accessed 25 December 2020, <https://www.statista.com/statistics/973815/worldwide-digital- voice-assistant-in-use/>

19 Statista, accessed 25 December 2020, <https://www.statista.com/chart/7841/where-people-use-voice- assistants/>

(34)

24 use the Google Assistant to trigger their home’s Smart Speaker to play media such as music, podcasts, news, and radio. Users can also ask it for a daily brief, this includes flight information, the traffic situation on their way to work, etc. It can manage tasks such as setting alarms, pulling up calendar appointments and adding items to a shopping cart. And of course, it can provide answers for just about anything users can ask it: sports, weather, finance, calculations, translations and more.

The Google Assistant is triggered when the user says the command “Hey Google” in one of the supported languages: Arabic, Bengali, Chinese (Simplified), Chinese (Traditional), Danish, Dutch, English, French, German, Gujarati, Hindi, Indonesian, Italian, Japanese, Kannada, Korean, Malayalam, Marathi, Norwegian, Polish, Portuguese (Brazil), Portuguese (Portugal), Russian, Spanish, Swedish, Tamil, Telugu, Thai, Turkish, Urdu, and Vietnamese. Google says that more languages are coming soon.20

Google Translate (GT) is one of the most used MT tools. It is available on smartphones as well as on the web. On web, the user simply goes to the address https://translate.google.com/ where they will see the following landing page (Figure 12):

Figure 12: Google Translate's landing page

20 Google Assistant Help, accessed 25 December 25, 2020.

https://support.google.com/assistant/answer/7172657?co=GENIE.Platform%3DAndroid&hl=en

(35)

25 The user must choose the target language from in the right-side box. Then, they must type the source text (ST) in the left side box and Google Translate will automatically detect the source language and show the translation as the user types the target text. The user can also copy and paste text into the left side box or upload documents in one of the supported formats (.doc, .docx, .odf, .pdf, .ppt, .pptx, .ps, .rtf, .txt, .xls, or .xlsx). GT Mobile works in a very similar way, except that it has more features such as instant translation of text via the Camera, instant transcriptions, conversation mode and integration with other apps to translate chat on WhatsApp, for example.

An article published by Google in 2016 (Ten Years of Google Translate, 2016) claimed that more than 500 million people use Google Translate to translate more than 100 billion words a day.

There are no official current numbers from Google regarding how these figures evolved in later years, but we can only assume that they went up.

GT became very successful because it offers a simple, efficient service: you type a word, a phrase, or a sentence in one language, the software seamlessly detects the language you typed the sentence in. You then choose your target language from a list of 103 available languages and there you have it, a translation of what you just typed.

In late 2019, Google announced a new feature for Google Assistant: Interpreter Mode, a feature that is supposed to replace human interpreters and translate from and into 27 languages in real time. Interpreter Mode is not only available on Smartphone, but also on all Google Home Speakers, all Smart Displays, Smart Clocks and Tablets. This is Interpreter Mode’s biggest differentiator from its bigger sibling, Google Translate. Since Interpreter Mode is integrated with many Smart devices, it can be launched without the use of a smartphone.

(36)

26 According to Google’s website21, IM can be used on:

• All Google Home speakers

• Some speakers with Google Assistant built-in

• All Smart Displays

• Smart Clocks

• Mobile phones and tablets

Using IM is simple. To translate a conversation:

• Say "Hey Google..."

• Say a command, like:

➢ "...Be my Italian interpreter"

➢ "...Interpret from Polish to Dutch"

➢ "...Chinese interpreter"

➢ "...Turn on interpreter mode."

• If you did not identify the languages in the command, you should choose the language pair you want to use

• When you hear the tone, start speaking in either language. There is no need to tell IM which of the languages is being spoken for it is detected automatically.

• On a Smart Display, you will both see and hear the translated conversation.

To stop using IM, say a command like:

➢ "Stop"

➢ "Quit"

➢ "Exit."

On a Smart Display, you can also swipe from left to right to stop IM.

21 Google Nest Help, consulted December 22 2020 https://support.google.com/googlenest/answer/9234753?hl=en

(37)

27 2.4.1.3. Previous Evaluations of Google Assistant’s Interpreter Mode

Before discussing the methodology that we followed to carry out the experiment designed for this evaluation of IM, we conducted preliminary research to see if there were any other previously done public evaluations of the system that are available.

When typing “evaluation of google assistant's interpreter mode” into Google’s Search bar, we get 3’140’000 results22 but none of the first 5 pages show any academic research. Typing the same query into Google Scholar’s search bar yields 20’600 results23 with no relevant articles in the first 2 pages. We concluded that no academic research has been done on Google Assistant’s Interpreter Mode, yet. This could be due to it being a novelty software, released only in 2019.

In contrast, when we typed “evaluation of google translate” into Google’s Search bar, the search yielded 107’000’000 results24, which is a considerably larger number than the one yielded by the first query. This time, academic research papers appeared even in the first page, indicating that the evaluation of Google Translate’s performance is an active and dynamic research field.

Switching to Google Scholar, the same query yields 872’000 results25, a considerably higher number than the one we obtained for Interpreter Mode’s query. Looking at the first page of the results, we noticed that many of them evaluated the accuracy of the tool, with most of the tests being conducted in a medical setting. The first result, as an example, was named “Use of Google Translate in medical communication: evaluation of accuracy”(Patil & Davies, 2014).

Due to the lack of data and research regarding Google Assistant’s Interpreter mode, we think that the present work is important and may inspire more research to be conducted on the subject.

22 Search realized October 30, 2020.

23 Search realized October 30, 2020.

24 Search realized October 30, 2020.

25 Search realized October 30, 2020.

(38)

28

2.5. Conclusion

In this chapter, we explored the theoretical aspects of this work: speech-to-speech MT and its components: Automatic Speech Recognition, MT, and Text-to-speech Synthesis. Then, we looked at MT and journalism and finally, we talked about Voice Assistants and Interpreter Mode.

In the next chapter, we will take a more practical approach by addressing the experiment we will conduct for the purpose of this study.

(39)

29

3. The experiment

3.1. Objective and research question

Note: in Chapters 3 and 4, P1 refers to the Participant in Question and their corresponding interview. When we say P1, we also mean Interview 1.

After defining the theoretical aspects of this work in Chapter 2, we will now move on to a more practical chapter. By completing this work, we aim to answer the following general research question:

➢ Can Google Assistant’s Interpreter Mode (IM) replace a human Interpreter in the context of a journalistic interview?

To provide an answer to our general research question, we needed to break it down into the following seven secondary questions:

1. Do IM’s intended users find it efficient and easy to use?

2. Does IM produce fluent and adequate translations?

3. Is IM’s translation quality affected by the type/complexity of the source text?

4. Is there a difference in performance depending on the source language?

5. Can IM keep track of the context during a conversation like a human interpreter?

6. Does the user’s Modern Standard Arabic (MSA) proficiency level affect IM’s performance?

The experiment that we will discuss in the present chapter will help us bring answers to the questions listed above. In a nutshell, the experiment consists in mimicking a short and controlled journalistic interview about the Coronavirus Pandemic between an English-speaking interviewer and three different Arabic-speaking Tunisian interviewees who identify Arabic as their mother tongue. The data obtained from each interview is to be analyzed in order to provide answers to all our research questions, in the form of a conclusion.

(40)

30 In this chapter, we will first take a look at the protocol (3.2) that we followed to carry out this experiment. We will begin by enumerating the steps that we took to complete the experiment to its full extent (3.2.1). Then, we will look at the interview questions (3.2.2) and explain the different groups that they belong to, before lastly delving deeper into how our three participants were chosen (3.2.3) and talking about the tools that we used during the experiment (3.2.4). The second part of this chapter will focus on our evaluation methods (3.3) before closing the chapter with a section that provides insight on the preliminary test that we conducted (3.4).

3.2. Experiment protocol

Steps

Carrying out experiments in a real-life setting (i.e., a real journalistic interview) with real journalists and interviewees can be difficult for different reasons; it is costly as well as logistically and ethically challenging. To avoid real-life field evaluation, we will be carrying out scenario- based experiments (Rosson & Carroll, 2002) instead – this method should allow us to reproduce real-world controlled settings within realistic conditions.

The following list summarizes the steps we took to conduct this experiment. They will be discussed further in upcoming sections.

Note: Because of the Covid-19 sanitary crisis, we provided hand sanitizer to all participants and took care of disinfecting every piece of equipment we used throughout the experiment after each use.

(41)

31 I. Finding Participants:

1. We look for three participants who will play the role of interviewees throughout this experiment.

2. We give a general overview of our work to interested individuals and obtain preliminary verbal consent to participate.

3. We agree on a time and place to carry out the experiment.

II. The Interview:

1. The participant arrives to the previously agreed-upon location of the interview.

2. We invite the participant to thoroughly read and sign the consent form (Annex 1).

3. The participant answers a survey (Annex 2) that provides general and demographic information about them.

4. The participant fills-out the English proficiency form (Annex 3).

5. We give information about the flow of the experiment to the participant and explain the key steps.

6. We set up the phone that will record the interview in video format (see 3.2.4).

7. We use an iPad to show the participant Google’s introductory video to using IM.

8. We ask the participant to confirm that they are ready to launch IM and begin the experiment.

9. We give the participant noise-cancelling headphones and begin the experiment (see 3.2.4).

10. Once the experiment is complete, we give the participant the System Usability Scale (SUS) questionnaire (see 3.3.2) and they fill it out on the spot.

11. The participant returns the filled-out SUS.

12. End of experiment.

(42)

32 III. Data Collection:

1. We use the screen recording and the recorded video (if necessary) to manually transcribe the interview (3.2.4).

2. Once the transcriptions are ready, the participant evaluates Fluency and Adequacy as described in section (3.3.1).

IV. Observations and conclusions:

1. We observe the obtained data.

2. We draw conclusions.

3. We provide answers to our secondary research questions.

4. We provide answers to our general research question.

The experiment took place on three different days, a different day for each of the three different participants. This was due to the logistic challenges imposed by the Coronavirus Pandemic and the social distancing protocols that must always be observed. For each participant, we reserved a one-hour time slot. We estimated that this would be more than enough to carry out the experiment in its entirety.

Questions

To carry out this experiment, we opted for a pre-designed journalistic interview about the Coronavirus Pandemic that consists of ten questions. The interview is completely fictional and will not be published in a journalistic context but having a factual conversation about a current hot topic may help the participants draw their answers from real-life experience if they choose to.

The questions are categorized into five groups according to the type of difficulty that they pose to Interpreter Mode. A summary of the questions, their category and their description/aim can be seen in Table 1:

(43)

33

Group Question(s) Description/Aim

1 Q1. Hi, what is your name? Introductory questions aimed at testing whether the system recognizes basic speech.

The answers will provide an insight on how the tool deals with non-occidental names.

Q2 will help to see whether the tool can keep track of the participant’s gender.

Q2. Could you please specify your gender?

Q3. Where are you from?

2 Q1. Have you been following the latest updates regarding the Coronavirus Pandemic?

(question that clearly has the word Coronavirus Pandemic in it)

Yes/no questions – will provide an insight on the success rate in terms of quality of the output translation for longer sentences. The answers (yes/no) are not relevant.

The transition from Q1 to Q2 will assess the tool’s ability to keep track of context:

identify that “it” refers to “Coronavirus Pandemic”

Q2. Have you been tested for it? (the pronoun

“it” replaces the phrase “Coronavirus Pandemic”)

3 Q1. How did you first learn about the Coronavirus Pandemic?

Complex questions – these questions are aimed at testing the tool’s ability to deal with both long questions and answers.

The interviewee will be instructed to keep their answers as short as one sentence.

Q2. How serious did you think it was when it first hit the news?

Q3. How do you feel about the global impact this illness has had?

4 Q1. How have you been coping with this global situation?

Open question – this question is aimed at testing the tool’s ability to handle a long answer that could be unrelated to the discussed matter. (The Coronavirus Pandemic, in this case).

5 Q1. Thank you for your time. Closing statement

Table 1: summary of interview questions as well as their description and/or aim

To add the element of spontaneity to the interview, we did not share the interview questions with the participants. Before beginning the interview, each participant was informed that:

(44)

34 - The questions will be asked one by one, allowing the interviewee time to answer.

- In case of unclear translation or any type of confusion, the participant must reply “I do not understand” in Arabic.

- In the case the participant replies with “I do not understand”, the interview question is repeated, one more time to ensure that the problem did not result from a connection issue.

- If the second time does not work either, we would move on to the next question and the entry for that question will be transcribed as (n/a).

Participants

In this experiment, we played the role of the journalist, which implies that we will prepare and provide the questions (in English) and guide the overall flow of the interview. We also needed participants (interviewees). A summary of the participant number and spoken languages during the interview can be seen in Table 2:

Interviewer Interviewees

Number: 1 Number: 3

Language used during the interview: English Language used during the interview: Arabic

Table 2: number and spoken languages of participants

To find participants for the experiment, we resorted to our personal circle of friends and acquaintances. More than three persons expressed interest and willingness to participate in this experiment, but we decided to limit the number of participants to 3 in order to simplify the process of data-collection.

The participants were not chosen arbitrarily. All participants are native Arabic speakers, however, their relationship with Modern Standard Arabic (MSA) differs. We asked each participant to identify their relationship with MSA and to self-assess their degree of MSA efficiency. The following Table 3 summarizes each participant’s response:

(45)

35 Participant Self-proclaimed MSA level Description

P1 Medium

P1 is a biologist at the University of Geneva and does not effectively use MSA at work, for studies or in their daily life but P1 regularly reads books written in MSA.

P2 Proficient

P2 is a translation student at the Faculty of Translation and Interpreting. They use MSA daily, for studies as well as work.

P3 Poor

P3 was born and raised in Tunisia, a North African country whose official language is Arabic as per Article 1 of the 2014 Tunisian constitution (Tunisia, 2014). P1 is a self-proclaimed native Arabic speaker but only speaks the Tunisian dialect and says that they have almost no proficiency in MSA.

Table 3: summary of participants' self-proclaimed MSA levels

(46)

36 The participants in this experiment will not only play the role of interviewees, but they will evaluate the fluency and adequacy of the system’s produced translations as well. For this reason, we decided that the participants needed to be proficient in the English language. This choice introduced a complication into the experiment: having bilingual participants may compromise the accuracy of the experiment because participants will be able to understand the questions they will be asked in English and may rely on what they hear from the interviewer instead of IM’s translation. To prevent this, we decided to equip every participant with completely isolating noise-cancelling headphones that will only allow them to solely hear IM’s translations (see section 3.2.4). Unlike the MSA proficiency, we needed to verify the participants’ English proficiency because we needed their evaluations to be relevant and informed. To this end, we used the table (Annex 3) provided by the Common European Framework of Reference for Languages (CEFR)26. This table categorizes an individual’s proficiency into three categories:

- Understanding - Speaking - Writing

According to this table, a user that identifies as having a C1 proficiency level in the

“Understanding” category, means that the user in question “can understand extended speech even when it is not clearly structured and when relationships are only implied and not signal[l]ed explicitly. I can understand television program[me]s and films without too much effort.”(Self- Assessment Grid - Table 2 (CEFR 3.3), n.d.). We decided that the C1 level matches the requirement to participate in this experiment, and so each participant needs to self-identify as having a C1 proficiency level in, at least, the understanding category.

For more data regarding all participants, please refer to the annexed demographic questionnaires (Annex 2).

26 https://www.coe.int/en/web/common-european-framework-reference-languages/home

(47)

37

Tools

To conduct this experiment, we need a reliable and recent smartphone device that is equipped with the Google Assistant application. Even though Google Assistant is intended for use on Android-powered devices, it is also available for download on iPhones which run iOS. The following Table 4 summarizes which phone we intended to use with each participant:

Participant Operating System used

P1 Android (Samsung Note 10+)

P2 iOS (iPhone 11 Pro)

P3 Android (Samsung Note 10+)

Table 4: Operating Systems intended to use with each participant

As described in Table 4, we wanted to use an Android phone with P3, but IM kept crashing upon activation during the experiment. We could not determine what caused this malfunction, but when we switched to iOS, the crashing had stopped.

As a result, we updated Table 4 became Table 5:

Participant Operating System used

P1 Android (Samsung Note 10+)

P2 iOS (iPhone 11 Pro)

P3 iOS (iPhone 11 Pro)

Table 5: updated Table 4

Figure 13 and Figure 14 provide the specifications as well as illustrations of the two smartphones we used during this experiment, a Samsung Note 10+ and an iPhone 11 Pro:

Références

Documents relatifs

BASIC PROGRAM, EXECUTION - Entering a RUN command, after a BASIC program has been entered into the microcomputer, will cause the current program to begin

seule chose que je puisse admettre , c’est que , si la société des Templiers existait réellement , ce dont je doute beau- coup , du temps de Massillon , Fénélon et des autres

The latter term can cover a variety of situations , but to determine the limits of this article , it would be apposite to restrict it to those interpreting

This qualitative study therefore aims to explore how a Facebook group chat amongst students of the University of Geneva’s master’s programme in Conference

+ Ne pas supprimer V attribution Le filigrane Google contenu dans chaque fichier est indispensable pour informer les internautes de notre projet et leur permettre d’accéder à

« Mais le souvenir que réveillent parmi nous les services des races antiques, n’en excite pas moins notre vénération ; loin de leur disputer le rang que le zèle et la valeur

Que des hommes oififs don- nent dans ces jeux enfantins , on tolére- ra leur amufement ; mais qu’ils n’entrai- nent point avec eux des gens qui pou- roient pafler pour

Pour la résection des lésions diminutives et des petits polypes non hyperplasiques, l’ESGE (European Society for Gynaecolo- gical Endoscopy) recommande fortement l’utilisation