• Aucun résultat trouvé

Analysis requiring a change in experiment design

8. Future Work and Possible Extensions

8.2 Analysis requiring a change in experiment design

In this section, we discuss additional studies that would require a change to the design of the experiment protocol.

Task completion times

Task completion time is often used as a measure to evaluate usability, and can also be applied when comparing modality combinations [42]. We did not analyze task completion times in our work because we had observed that users did not consistently signal the end of a task and the start of a new task, despite being explicitly asked to do so both in the experiment instructions document and by the experimenter. While task completion could be analyzed using post-experiment analysis of the video data that was recorded, the results were not fine-grained enough to draw scientific conclusions about the differences in task completion times across the different modality combinations. However, Cohen and Oviatt [16] suggest that user preferences for modalities may be influenced by time-to-input rather than by overall performance measures. Therefore, it would be worthwhile to develop a new evaluation protocol that would more strictly enforce the indication of task completion and thus allow for the investigation of whether such influences play a role in the results presented here.

Increased experiment length

In section 6.5, where we discussed the evolution of modality use, we saw that user behaviour with different modalities changed over time, and in particular that language use decreased over time while pointing use increased. Due to time and

resource constraints, we were not able to run experiments that were long enough to reveal the point at which interaction between modalities stabilizes. While we believe that the results presented in this thesis are a valid indicator of the direction that modality use will follow, longer studies to confirm these results would be welcomed. Moreover, studies over a longer period of time would give users in-context practice with the various modalities, and in particular with voice, which as Bell et al. [36], Le Bigot, Jamet, and Rouet [53] and Strum et al. [123] suggest, can influence user behaviour.

Interaction patterns

In our work, we have not looked at the nature of the interaction patterns (the order in which modalities are chosen, and for which concrete tasks they are chosen) that users adopt for problem solving during their interaction. This was done for three reasons. The first is that it speaks more to the evaluation of the Archivus system itself rather than to how modalities are used to access information using the system. The second is that a flexibly multimodal interface allows for too much variability in how modalities can be used together. The third is that there were many different ways in which users could answer a single question - for example the date on which a meeting happened could be found using the predefined criteria buttons, rearranging books in the bookcase, or opening the meeting book.

This, when combined with all of the modalities that it was possible to use at each step in the interaction, led to a combinatorial explosion. However, Sturm et al.

[123] found that there were clear differences in interaction patterns adopted by different users, so it would be interesting to see whether the same holds for interaction with the Archivus system, and how those patterns might influence modality choice. However, experiments of this kind would have to be done on only a limited set of questions and with single modalities or modality pairs, in order to make analysis manageable.

Task type and modality choice correlations

The way in which the experiment presented in this thesis was conceived did not allow for a sufficiently detailed evaluation of whether there were correlations between the choice of a modality or modality combination to perform a task, and the type of task. It would be worthwhile to delve deeper into this issue in future studies, as we suspect that correlations are present. In addition to the types of analysis described in section 6.9, we would also follow work by Tricot [107], which suggests looking at whether there are differences associated with finding information that is explicit in a text or information that has to be inferred. The

question set used in the Archivus experiments was not designed to distinguish between these two cases, so a vast majority of the questions involved finding information that was explicit either in the data or in the interface elements themselves.

Browsing vs. searching tasks

The Archivus system allows for either searching on the multimedia data in the database or simply browsing it, and it would be interesting to determine whether interaction using different modalities changes depending on which of these activities the user is doing. However, the questions that were used in the experiments described in this thesis were not targeted to specifically elicit either searching or browsing behaviour. Although finding the answers to most questions required both searching and browsing, some questions could be answered using only one of these. Moreover, there was a lot of variability in how users mixed searching and browsing to answer specific questions. Consequently, it was impossible to determine whether there were differences in modality use that were correlated to either of these activities in particular. In order to test for such a correlation, we would need to define an experiment in which users were forced to either exclusively search or browse using the different modalities.

Cognitive factors

Neither the competencies of the author nor the availability of resources allowed for an investigation into the cognitive factors that might be affecting modality choice in this work. Le Bigot et al. [51], for example, show that there is a relationship between a modality chosen for an interaction and the cognitive costs implied in planning for interaction and for solving particular tasks. Similarly, Grasso [40] shows that interfaces that require the use of speech impose higher memory requirements than those that do not. It would be interesting to determine the extent to which such factors affect the use of multimedia and multimodal systems such as Archivus, and whether the simultaneous availability of different input modalities reduces them.

Further inspired by the work of Le Bigot et al. [51], who found that ‘voice recognition errors resulted in an increase in stress and mental load which, in turn, led to an increase in the number of voice recognition errors’,it would be interesting to investigate whether the same holds true in the context of Archivus, whether it also applies to errors made with other input modalities, and if so, whether it is to the same degree.

Experimental vs. real-use studies

Karlgren [18] notes that there are differences in user behaviour in experimental studies and real-use studies, and that subjects in experimental studies tend to try harder ‘both because of curiosity and the novelty of the situation and to perform well in a situation where they are observed’. Unfortunately, at this point in time doing real-use experiments in the multimedia meeting browsing and retrieval domain is impossible since there are no institutions that record and annotate meetings in the ways necessary on a regular basis. Nevertheless, it would be useful to know if a real-use situation would significantly alter the use of modalities in the meeting browsing and retrieval context.