Smartphones: evidence-based user-interface design

(1)

Article

Reference

Smartphones: evidence-based user-interface design

EHRLER, Frédéric, et al.

Abstract

Smartphones have become increasingly popular among every segment of the population.

Caregivers do not want to miss out on this evolution and express interest in using mobile devices to perform their everyday care. This tendency has been well understood by many software providers who have produced many medical applications for smartphones. Before going a step further and developing tools to manage Clinical Information System data on handheld devices, it is wise to ask ourselves whether these new tools are well adapted to the healthcare environment. Indeed, some studies have raised concerns regarding the efficiency of these handheld devices to input medical data, especially with the induced errors. In this paper, we look to adopt a rigorous approach to acquire evidence about these concerns through a prospective study. In order to get this evidence, the study compares several input interfaces in the context of recording vital signs on mobile devices. We would like to discover not only which interface is the most efficient, but also which one is the least prone to errors.

EHRLER, Frédéric, et al . Smartphones: evidence-based user-interface design. Studies in Health Technology and Informatics , 2013, vol. 192, p. 57-61

PMID : 23920515

DOI : 10.3233/978-1-61499-289-9-57

Available at:

http://archive-ouverte.unige.ch/unige:32609

Disclaimer: layout of this document may differ from the published version.

1 / 1

(2)

Smartphones : Evidence-based User-Interface Design

Frederic Ehrler^a, Magali Walesa^b, Evelyne Sarrey^c, Christian Lovis^a

aUniversity Hospitals of Geneva, Division of Medical Information Sciences, Geneva, Switzerland

bUniversity of Geneva, School of Medicine, Geneva, Switzerland

cUniversity Hospitals of Geneva, Direction of Nursing, Geneva, Switzerland

Abstract

Smartphones have become increasingly popular among every segment of the population. Caregivers do not want to miss out on this evolution and express interest in using mobile devices to perform their everyday care. This tendency has been well understood by many software providers who have produced many medical applications for smartphones. Before going a step further and developing tools to manage Clinical Infor- mation System data on handheld devices, it is wise to ask our- selves whether these new tools are well adapted to the healthcare environment. Indeed, some studies have raised concerns regarding the efficiency of these handheld devices to input medical data, especially with the induced errors. In this paper, we look to adopt a rigorous approach to acquire evi- dence about these concerns through a prospective study. In order to get this evidence, the study compares several input interfaces in the context of recording vital signs on mobile devices. We would like to discover not only which interface is the most efficient, but also which one is the least prone to er- rors.

Keywords:

Computers, Handheld; Medical Informatics; Evidence-Based Health Care.

Introduction

Smartphones have become increasingly popular. Their ubiquitous nature and ability to constantly access any source of information makes these devices an extension of us. Many clinicians are known to be early adopters of new technologies, so they are eager to benefit from such tools for their everyday work [1]. Caregivers are not only interested in such technology for their luxurious nature, but also are aware of the numer- ous advantages brought by mobile devices [2]. If correctly designed, developed, and introduced, handheld tools can make it possible for caregivers to spend more time at patients' bedside and consequently improve their relationship and simplify their workflow [3]. Software providers have well understood this trend and have provided more and more medical applications dedicated to healthcare. Applications involving calculators, literature reference, disease diagnosis, etc… are already broadly used by clinicians [4]. However, fully integrated solutions to manage clinical data in hospitals are still missing. This situation may soon change due to the strong interest of stakeholders to integrate handheld tools in the daily workflow of caregivers in clinical environment. All the benefits induced by the introduction of mobile devices into the care workflow seem very attractive. However, before going further in the development of dedicated applications, it may be wise to spend some time to assess whether the deployment of such a solution also induces some risks. Indeed, some studies have

shown that gathering information through mobile devices can lead to an increase of errors due to the reduced screen size and unusual method of interaction [5]. One of the reasons for this failure is the lack of precaution in building mobile applications interfaces. Mobile interfaces have specific characteristics such as a reduced display size and a unique interaction para- digm that must be taken into account [6]. Instead of relying on predefined ideas and common sense, we wanted to extract formal evidence to discover the properties of an efficient phe- notype. For this purpose, we have set up a usability test that evaluates and compares several interfaces to record vital signs.

A tool has been developed and tested by about one hundred nurses. Vital signs have been selected to be the data recorded during the experiment for several reasons. These data are fre- quently recorded at a patients’ bedside and need to be accurate [7]. Actually, in most hospitals, the measures are recorded on a piece of paper and then transcribed in the clinical information system (CIS). This process is obviously a waste of time and a potential source of errors, as it implies data re- transcription.

Materials and Methods

The aim of the experiment is to acquire evidence about the quality of data recorded through interfaces on mobile devices.

For this purpose, we have set up a usability test [8]. The test records user performance while entering vital signs through handheld devices. This test is completely embedded into a program and doesn’t require any external intervention. Users are led through several screens where they record vital sign measures through different interfaces. During the experiment, measures of performance such as time and error rate are recorded. These measures allow for the comparison of the various models. The necessity to develop a very robust program was a strong requirement. It was imperative that all the desired values were properly recorded and that participants were not an- noyed by any bug. Therefore, the development has required one year of work for the author.

The vital signs

In our experimental settings, we focused on three different types of vital signs: pulse, temperature, and respiratory rate.

These signs have been chosen for their different characteristics. They all have a specific range of values and require a different level of precision. For instance, in our experiment, the pulse can take 140 different values ranging from 30 to 170.

The temperature can take 50 values from 36.0 to 41.0. Finally, the respiratory rate has only 17 possible values ranging from 3 to 20. We expect these characteristics to influence the quality of data entry and to help us discover if an interface works better with certain vital signs.

C.U. Lehmann et al. (Eds.)

This article is published online with Open Access by IOS Press and distributed under the terms of the Creative Commons Attribution Non-Commercial License.

doi:10.3233/978-1-61499-289-9-57

(3)

Table 1 - Selected vital signs

Sign Value range Decimal Nb values

Pulse 30-170 0 140

Temperature 36-41 1 50

Respiratory rate 3-20 0 17

The interfaces

It is straightforward to enter a number with a physical keyboard. The task becomes more problematic when done on modern smartphones. Mobile devices usually lack a physical keyboard and inputs are often handled through tactile interac- tions on a screen. The simplest and most widely accepted solution to perform this tactile interaction is to emulate a virtual keypad on the screen. However, other solutions exist.

The numeric keypad

Figure 1 - Numeric Keypad

The numeric keypad (Figure 1) is the most popular model to input numbers. People are usually very familiar with this model, as it is not only used on smartphones, but also on tele- phones, calculators, and many other devices.

The numeric stepper

Figure 2 - Numeric stepper

The stepper (Figure 2) is also a popular model; it is quite simple to use. A numerical digit is increased by pressing a “+”and decreased by pressing a “-”. Every numerical digit comprising a number must be modified independently until reaching the desired value.

The numeric wheeler

Figure 3 - Numeric wheeler

The wheeler (Figure 3) is pretty similar to the stepper but is a continuous version of it. The central wheel starts to turn as soon as the user swipes the screen downward or upward. The scrolling speed is dependent of the energy unleashed by the user movement. The wheel continues scrolling until all the

energy is dissipated or when the user stops the wheel with their finger. Compared to the previous model, the wheeler allows for faster skimming through a large range of number.

The circle model

Figure 4 - Circle model

The circle model (Figure 4) is a brand new model where the user can select a number without removing their finger from the screen. It has been inspired by the swipe keyboard that is available on some smartphones. In order to select a number the user swipes his finger on the required digit and validates it in the central area. The process is repeated until the number is complete. To validate the selection, the user removes their finger from the screen.

The column model

Figure 5 - Column model

The column model (Figure 5) has been conceived based on the idea of being able to enter a measure directly on a graphic. In this model, the user swipes his finger upward or downward on the screen to select the desired number. The number is selected when the user removes their finger from the screen.

The character recognition model

Figure 6 - Character recognition model

The character recognition model (Figure 6) is a model relying on a technique that has been widely used to transfer a handwritten document to a digitally interpretable one. The most efficient recognition systems rely on machine learning algorithms and can be adapted to specific writing styles [9]. How- ever, in our experiment, it was not possible to learn the users’

style before each experiment. Instead, we relied on a simple algorithm that decomposes the digit entered by the user to a F. Ehrler et al. / Smartphones: Evidence-Based User-Interface Design

58

(4)

sequence of segments and compares it to a database contain- ing digit models [10]. The most likely digit is chosen based on an edit distance (Freeman's chain code [11]).

The experiment The participants

Participants have been recruited randomly from several wards of the University Hospital of Geneva (HUG). In order to ob- tain significant results, we have decided to recruit at least one hundred nurses.

Recorded information

In the experiment, we compare different interfaces based on several indicators. The high frequency of vital sign recording in the nursing workflow makes the time taken to record this measure of obvious concern. A difference of few seconds in one manipulation can quickly lead to an important loss of productivity when the task is repeated hundreds of times. An- other critical indicator is the number of errors, as quality care cannot be delivered without accurate information.

We are also interested in recording some demographic characteristics in order to assess any confounding effect. For instance, we expect some older users to be less adept with modern technology [12]. Therefore, we record the users’ year of birth, function in the hospital, familiarity with smartphones, and familiarity with vital sign recording.

The experimental process

Users are guided through the experiment via a series of steps as described below:

1. Presentation of the experiment:Users get informed about the task to be performed. They are instructed to keep the device in one hand during the experiment.

This is required to simulate the realistic use of the tool at the bedside.

2. User questionnaire: Users have to fill in a question- naire to record their demographic information and familiarity with handheld devices.

3. User experiment:The user experiment itself is di- vided in several sub-steps. During this phase, users are introduced and trained on all the models in a ran- dom order and are then evaluated on their performance. The training stage is only available the first time a new model appears and the evaluation is only presented after the last use of a model.

4. Model ranking: Once all the tests are complete, us- ers are asked to rank the models based on their preferences.

Once the experiment is completed, all of the data is automati- cally recorded in the smartphone.

The evaluation

Every parameter of the experiment is recorded to make the evaluation possible. Beside the objective evaluation based on recorded indicators, we also ask users to make their own subjective evaluation of the models. This user evaluation is done based on four criteria extracted from the standard usability criterion [13].

1. Efficiency:The efficiency of measurement entry 2. Learning:The difficulty experienced in learning to

master the model

3. Satisfaction: The overall satisfaction toward the model

4. Precision: The comfort level with entering meas- urements precisely

These four criteria are filled by users each time they use a model for the last time. For each of these criteria, users can choose against five different levels of satisfaction ranging from a very satisfied level of 1 to an unsatisfied level of 5.

Results

Once the whole experiment was set up, six months were spent in different wards to enroll a sufficient numbers of participants to perform the experiment. The results have been collected in a common repository for cleaning and analysis. For instance, to clean the data, we removed the small number of tests that failed due to an improper manipulation by the user. We also fixed a few recorded inaccuracies due to unexpected interrup- tions.

Population

In total, 93 nurses participated in the test and 87 remained after cleaning the data. By analyzing the data recorded in the questionnaire filled by the participants, we extracted some interesting statistics. There were not enough participants to create a smooth age distribution, but there were enough to sufficiently regroup the users in four age-range classes of similar weight (<36, 36-42, 43-48, >48 years old). These four classes are used to analyze the influence of the age of the par- ticipant on the other results. Among the interesting parameters influenced by age is a relationship between the possession of a smartphone and the age class as shown on Figure 7.

Figure 7 - Possession of smartphone by age class

Training

The first time a participants encounters a new model, he is asked to enter two measures to get familiar with the model before starting the real test. This training stage continues until the user succeeds in entering the two required measures. Once the two measures entered, the user can choose to continue the training. In the following graph (Figure 8), we present the time taken to enter the two measures, the number of errors committed during the process, and the number of measures entered in the process. All the measures are presented as rela- tive percentage of the summed value.

(5)

Figure 8 - Time and errors during training phase We clearly see that the amount of errors is much higher for the column, circle, and character recognition models. One notices that the level of error is usually correlated with the training time. The character recognition model shows a specific behav- ior as the learning time is much greater than the number of errors.

Experiment

The first measure of interest is the average time taken to enter the measures.

Figure 9 - Average time for each model

We see clearly on Figure 9 that the numeric keyboard is, by far, the most efficient model, whereas the column and character recognition models perform poorly.

Figure 10 - Accuracy for each model

Regarding the accuracy (Figure 10), there is not much difference between the keyboard, stepper and wheeler. The character recognition option is less accurate than the other model.

User evaluation

Users provide a subjective evaluation of each model given four different axes: learning, precision, efficiency and satisfaction.

Figure 11 - Learning by user

Figure 11 shows the users’ appreciation of the difficulty in learning to use the model. The x-axis represents the level of ease to learn the model (1 easiest – 6 hardest). Obviously, popular models are more easily mastered by users. The interesting point is the difficulty of users to master the circle model that relies on an innovative manipulation concept.

Ranking

At the end of the whole experiment, users have to rank the different models.

Figure 12 - User overall ranking

The results (Figure 12) show that the numeric keyboard is undoubtedly ranked as the favorite model for most of the users. The stepper takes the second rank. It is less clear to define whether the wheeler or the circle model takes third place, but a small advantage seems to be given to the wheeler over the circle model. Finally, the ranking is ended by the column and character recognition models that did not seem to satisfy the users.

Discussion

Dealing with the missing values

Users can choose, whenever they want, to skip recording a value. There is no simple solution to handle these missing values. We can choose to ignore them. However, based on the hypothesis that users skip values when they are bored about a model or have difficulty entering a digit, these missing values should reflect poor performance. As the good strategy to deal with these missing values was not obvious, we decided to re- move the measures where results are missing (Figure 13).

F. Ehrler et al. / Smartphones: Evidence-Based User-Interface Design 60

(6)

Figure 13 - Percentage of missing values per model

The failure of the digital character recognition model The character recognition results were significantly worse than those of the other models. The model took more time to master by some users, resulted in a greater number of errors, and received a bad evaluation. The results might have been a better with a more advanced recognition algorithm [9]. How- ever, this is unlikely because the experimental setting con- strained us to generic algorithms that do not adapt to the users’

preferences. Therefore, these bad results are not surprising.

Younger users are more familiar with smartphones We have seen on Figure 7 that smartphones have a better pen- etration among younger users. Age also influences the performance as younger users take less time overall than older people (Figure 14).

Figure 14 - Average experiment time given age class

User preferences reflect their real performance on the model

The users’ subjective evaluation is closely correlated with the parameters recorded during the experiment (Figure 11, Figure 12). For instance, the numeric keyboard has the best performance in terms of time taken and user preference. This shows that users are not influenced by fancy interfaces that are not efficient and thus can be trusted to make good choices [14].

Conclusion

Developers and hospital stakeholders express a great enthusi- asm about the deployment of mobile technology in the healthcare environment. Unfortunately, the lack of precaution regarding the use of such technology is one of the reasons that related projects have sometimes failed. Too often, the development choices have been taken based on developers’ beliefs and did not match the constraints of the clinical environment..

In this study, we aimed to compare several mobile interfaces to extract evidence about the most efficient model to record

vital signs. This task was chosen as it is performed constantly by nurses at patient bedside, and needs to be done accurately to monitor the patients’ health. We proposed six different interfaces, of which some were well known, while others were more innovative. The results show clearly that the majority of users feel more comfortable and show better performance with the simpler models. This is an important signal to all applica- tion developers who often favor fancy and eye-catching solutions over simple but efficient ones.

References

[1] Garritty C, El Emam K. Who’s Using PDAs? Estimates of PDA Use by Health Care Providers: A Systematic Review of Surveys. J Med Internet Res. 2006;8(2).

[2] Wu RC, Straus SE. Evidence for handheld electronic medical records in improving care: a systematic review.

BMC Med Inform Decis Mak. 2006;626.

[3] Kjeldskov J, Skov MB. Exploring context-awareness for ubiquitous computing in the healthcare domain. Pers Ubiquit Comput. 2007;11(7):549-62.

[4] Mosa ASM, Yoo I, Sheets L. A Systematic Review of Healthcare Applications for Smartphones. BMC Med Inform Decis Mak. 2012;12(67).

[5] Haller G, Haller D, Courvoisier D, Lovis C. Handheld vs.

Laptop Computers for Electronic Data Collection in Clinical Research: A Crossover Randomized Trial. J Am Med Inform Assoc. 2009;16(5):651-60.

[6] Ehrler F, Walesa M, Sarrey E, Wipfli R, Lovis C. INCA - Individual Nomad Clinical Assistant - supporting nurses with mobile devices. Stud Health Technol Inform.

2012;180:1079-83.

[7] Gearing P, Olney C, Davis K, Lozano D, Smith L, Friedman B. Enhancing patient safety through electronic medical record documentation of vital signs. J Healthc Inf Manag. 2006;20(4):40-5.

[8] Rubin J, Chisnell D. Handbook of Usability Testing:

Howto Plan, Design, and Conduct Effective Tests. Sons JW, editor2011.

[9] Liu C-L, Nakashima K, Sako H, Fujisawa H. Handwritten digit recognition: benchmarking of state-of-the-art techniques. Pattern Recognit. 2003;36:2271 – 85.

[10]Chan K-F, Yeung D-Y. Recognizing on-line handwritten alphanumeric characters through fexible structural matching. Pattern Recognit. 1999;32:1099-114.

[11]Bernard M, Fromont E, Habard A, Sebban M, editors.

Handwritten Digit Recognition using Edit Distance-Based KNN. Teaching Machine Learning Workshop; 2012;

Edinburgh, Scotland.

[12]Wright P, Bartram C, Rogers N, Emslie H, Evans J, Wilson B, et al. Text entry on handheld computers by older users. Ergonomics. 2000;43(6):702-16.

[13]Lobo D, Kaskaloglu K, Kim C, Herbert S. Web Usability Guidelines For Smartphones: A Synergic Approach.

IJIEE. 2011;1(1):33-7.

[14]Kaikkonen A, Kekäläinen A, Cankar M, Kallio T, Kankainen A. Usability Testing of Mobile Applications:

A Comparison between Laboratory and Field Testing.

Journal of Usability Studies. 2005;1(1):4-17.

Address for correspondence Frédéric Ehrler

Univ. Hospitals of Geneva, Div. of Medical Information Sciences 4, rue Gabrielle-Perret-Gentil, 1211 Geneva 4, Switzerland [email protected]