System Usability - Experiment results - Google Assistant’s Interpreter Mode: is it really an in

4. Experiment results

4.3. System Usability

As seen in section 3.3.2, the SUS is a Likert scale. It is composed of ten items to which participants express their degree of agreement or disagreement; from “strongly agree” to

“strongly disagree”. The SUS gives “a global view of subjective assessments of usability” (Jordan et al., 1996, Chapter 21).

We calculated the system usability score using the System Usability scale (SUS) presented in section 3.3.2.

According to Figure 19: meaning of different SUS scores, a good SUS score is situated between (68) and (80.3), and anything above (80.3) is considered to have had an “excellent” score.

However, a score situated between (51) and (68) is considered to be “poor”.

For each of the points discussed below, a value of 1 represents “strongly disagree” and a value of 5 represents “strongly agree”.

After each experiment session with a participant, the SUS was sent to them by e-mail and they filled it out before concluding the experiment.

The final SUS scores can be seen in Table 6:

Participant SUS score

P1 65

P2 60

P3 32.5

Average 52.5

Table 6: final SUS scores given by each participant

P1 and P2 gave IM very close scores, with P1 giving the highest score of all three, (65). P3 who is not very proficient with MSA (see Table 3) gave the worst score of all three (32.5). These results

58 are not conclusive, but they suggest that a person’s level of MSA may affect their opinion on the usability of Interpreter Mode.

All three scores given by the three participants are below 68 and the average score given to Interpreter Mode by the three participants is 52, which is not only inferior to the suggested score that permits a tool to “pass” but is also considered to be “poor”.

The following tables contain detailed results and remarks for each of the 10 SUS items.

Item 1 Participant Answer chosen Average score (out of 5) I think that I would like to

Table 7: Answers chosen by participants for item 1 of the SUS

Since IM is aimed at the average user, we think that item 1 of the SUS is particularly important.

Participants were somewhat in agreement for this entry. Keeping in mind that our scale ranges from 1 (strongly disagree) to 5 (strongly agree), we noticed that the average given by the three participants is (1.6), which is a below-average score that indicates the participants would likely not use IM frequently (Table 7).

item 2 Participant Answer chosen Average score (out of 5)

I found Interpreter Mode unnecessarily complex

P1 1

P2 2 2

P3 3

Table 8: Answers chosen by participants for item 2 of the SUS

The second item shows total disagreement between participants. Each participant gave a different answer with values ranging from 1 to 3 which is the neutral value of the scale. This goes to show that participants did not find IM particularly complex, but the user experience varied from one individual to another. Again, we notice that P3 gave a score that indicates that they

59 found the tool unnecessarily complex more than the other participants, which leads us to think that the MSA level might have affected this score (Table 8).

item 3 Participant Answer chosen Average score (out of 5)

I thought Interpreter Mode was easy to use

P1 4

P2 3 3

P3 2

Table 9: Answers chosen by participants for item 3 of the SUS

The third item that asks the participants whether they thought Interpreter Mode was easy to use also, showed total disagreement between participants. Each participant gave a different answer with values varying from 2 to 4. The average obtained was 3 which is the middle of the scale. The middle of the scale means that the participants neither agree nor disagree that Interpreter Mode was easy to use (Table 9).

item 4 Participant Answer chosen Average score (out of 5) I think that I would need

assistance to be able to use Interpreter Mode

P1 1 2

P2 1

P3 4

Table 10: Answers chosen by participants for item 4 of the SUS

For item 4, the results show agreement between two of the three participants with the P3 having an almost opposite opinion from the two. The table shows that P1 and P2, who are proficient and somewhat proficient in MSA respectively, agree that they would not need assistance to use IM. While P3 who is not at all proficient in MSA replied with an almost extreme opposite to this item (4/5). This again suggests that the MSA level might have affected P3’s perception of the tool’s usability (Table 10).

60 item 5 Participant Answer chosen Average score (out of 5) I found the various functions

in Interpreter Mode were well integrated

P1 3

P2 3 3

P3 3

Table 11: Answers chosen by participants for Entry 5 of the SUS

For this item, all three participants were in full agreement and chose the middle option (3) as shown in Figure 21 below:

Figure 21: screenshot showing the SUS item 5 results

The middle option means that the user does not agree nor disagree, which in our study is the shared opinion of all three participants regarding whether they found the various functions in Interpreter Mode well integrated.

61 item 6 Participant Answer chosen Average score (out of 5) I thought there was too

much inconsistency in Interpreter Mode

P1 4

P2 4 4.3

P3 5

Table 12: Answers chosen by participants for item 6 of the SUS

Item 6 showed an almost unanimous opinion from the three participants. The results show that P1 and P2 almost strongly agree that they there was too much inconsistency in IM. P3 chose the extreme option (strongly agree = 5) which suggests that all the participants believe the tool has performed poorly overall (Table 12).

62 item 7 Participant Answer chosen Average score (out of 5) I would imagine that most

people would learn to use Interpreter Mode very quickly

P1 4

P2 3 3.3

P3 3

Table 13: Answers chosen by participants for item 7 of the SUS

For item 7, while P1 gave a different answer compared to P2 and P3 (who gave the same answer), the average again indicates that as a whole group, they all do not agree nor disagree (Table 13).

item 8 Participant Answer chosen Average score (out of 5) I found Interpreter Mode

very cumbersome to use

P1 2

P2 4 3

P3 3

Table 14: Answers chosen by participants for item 8 of the SUS

For item 8, the results show disagreement between the three participants. Table 14 shows that the three participants chose different values ranging between (2) and (4), but the average score is, yet again, (3) which is the middle, “neutral” value.

item 9 Participant Answer chosen Average score (out of 5)

I felt very confident using Interpreter Mode

P1 3

P2 5 2

P3 2

Table 15: Answers chosen by participants for item 9 of the SUS

For item 9, once more, the results show disagreement between the three participants. The results show that P1 chose the middle value (3) which reflects a neutral or uncertain attitude towards the item. P2 and 3 showed almost extreme and firm opinions with participant 2 choosing

63 the value (5) which means “strongly agree” and P3 choosing the value (2) which is an almost

“strongly disagree”. We believe that the MSA level has an impact on how this item was answered:

the more proficient in MSA the participant is, the more confident they felt using IM (Table 15).

item 10 Participant Answer chosen Average score (out of 5) I needed to learn a lot of

things before I could get going with Interpreter Mode

P1 1 2

P2 1

P3 4

Table 16: Answers chosen by participants for item 10 of the SUS

The tenth and last item shows agreement between two of the three participants (P1 and P2) with the P3 having an almost opposite opinion from the two. P1 and P2 who are proficient and somewhat proficient respectively MSA agree that they would not need to learn a lot of things before using IM. While P3 who is not at all proficient in MSA gives an almost extreme opposite reply to this item. We believe that the MSA level could affect how the user perceived its usability (Table 16).

Looking at all the results above, we can see that Interpreter Mode did not do well in the SUS test and that participants did not find it very user-friendly. All three participants gave it a poor score (between 51 and 68), which resulted in the tool obtaining a poor general score.

After looking at the general usability of IM, we will now look into the specifics of its performance as a translation tool.

Dans le document Google Assistant’s Interpreter Mode: is it really an interpreter? (Page 67-74)