From Direct manipulation to Gestures

(1)

HAL Id: tel-01557524

https://tel.archives-ouvertes.fr/tel-01557524

Submitted on 6 Jul 2017

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

From Direct manipulation to Gestures

Caroline Appert

To cite this version:

Caroline Appert. From Direct manipulation to Gestures: Moving the Expressive Power from the Displays to the Fingers. Human-Computer Interaction [cs.HC]. Paris-Sud XI, 2017. �tel-01557524�

(2)

UNIVERSIT ´E PARIS-SUD

HABILITATION `A DIRIGER DES RECHERCHES

pr´esent´ee par

Caroline Appert

Sp´ecialit´e : Informatique – Interaction Homme-Machine

From Direct manipulation to Gestures:

Moving the Expressive Power from the Displays to the Fingers

le 26 Juin 2017

Michel Beaudouin-Lafon Universit´e Paris-Sud Examinateur

Stephen Brewster University of Glasgow Rapporteur

G´ery Casiez Universit´e Lille 1 Examinateur

Andy Cockburn University of Canterbury Rapporteur

Jean-Claude Martin Universit´e Paris-Sud Examinateur

Laurence Nigay Universit´e Grenoble Alpes Rapporteur

Shumin Zhai Google Examinateur

Habilitation à Diriger des Recherches préparée au sein

du Laboratoire de Recherche en Informatique de l’Universit´e Paris-Sud et d’Inria Saclay – ˆIle-de-France

(3)

Acknowledgments

First of all, I would like to warmly thank all my jury members. I am very honored to have had the opportunity to present my work in front of so prominent researchers. I thank my rapporteurs (Stephen Brewster, Andy Cockburn and Lau-rence Nigay) who took some of their time to write thoughtful reports about my manuscript; and I thank all my jury members for having traveled to attend the de-fense (or stayed awake late for Andy), and for the insightful comments and discus-sions that followed the presentation. Special thanks to Shumin for having traveled from so far.

Thanks to all ILDA members. To Olivier who has been my co-bureau and friend for many years. To my PhD students (Hugo Romat, Mar´ıa J´esus Lobo and Rafael Morales) who all made me learn different aspects of what supervising means. To the Master students that I have supervised.

Thanks to many former InSitu members, and especially those who I worked with (Fanis Tsandilas, St´ephane Huot, Olivier Bau, Daniel Spelmezan, Halla Olafs-dottir, Wendy Mackay, Jean-Daniel Fekete and, of course, Michel Beaudouin-Lafon).

I also think about all HCI people who I had fun with at different occasions (Olivier Bau, Nathalie Henry-Riche, Yann Riche, Fanny Chevalier, St´ephane Con-versy, Nicolas Roussel, Renaud Blanch, Romain Primet, Mathieu Nancel, Julie Wagner, and many others). It is always great to see you at any possible occasion.

Friends for the administrative staff, St´ephanie Druetta and Alexandra Merlin, thank you!

Finally, many thanks to the ones I love. Kids (Antoine, Baptiste, Maxime and Thomas), I cannot express how lucky I am to have you in my life. Emmanuel... yes, yet another “...”... I love you.

(4)

Abstract

Optimizing the bandwidth of the communication channel between users and the system is fundamental for designing efficient interactive systems. Apart from the case of speech-based interfaces that rely on users’ natural language, this en-tails designing an efficient language that users can adopt and that the system can understand. My research has been focusing on studying and optimizing the two fol-lowing types of languages: interfaces that allow users to trigger actions through the direct manipulation of on-screen objects, and interactive systems that allow users to invoke commands by performing specific movements. Direct manipulation re-quires encoding most information in the graphical representation, mostly relying on users’ ability to recognize visual elements; whereas gesture-based interaction interprets the shape and dynamics of users’ movements, mostly relying on users’ ability to recall specific movements. This manuscript presents my main research projects about these two types of language, and discusses how we can increase the efficiency of interactive systems that make use of them. When using direct manip-ulation, achieving a high expressive power and a good level of usability depends on the interface’s ability to accommodate large graphical scenes while enabling the easy selection and manipulation of objects in the scene. When using gestures, it depends on the number of different gestures in the system’s vocabulary, as well as on the simplicity of those gestures, that should remain easy to learn and execute. I conclude with directions for future work around interaction with tangible objects.

(7)

(8)

(9)

(10)

One of the grand challenges of research in Human-Computer Interaction (HCI) consists in optimizing the bandwidth of the communication channel between users and the system. Apart from the case of speech-based interfaces that rely on users’ natural language, this entails designing an efficient language that users can adopt and that the system can understand. What “efficient” means depends on the type of users and the context of use. For example, operators whose goal is to maximize their productivity will need a vocabulary that allows them to invoke a limited set of commands very quickly. On the opposite, artists may not consider execution speed as highly critical, and will rather need a large vocabulary that allows them to explore a large design space. On average, good interaction design should make the language easy to learn and manipulate, allowing users to comfortably express their intent to the system with a reasonable speed.

This manuscript focuses on the two types of languages that are respectively used in point-based and gesture-based interfaces. The term point-based interfaces refers to graphical interfaces featuring objects that users can designate and manipulate with a pointing device to invoke commands. In those interfaces, most informa-tion is encoded in the graphical representainforma-tion, and very few in users’ movements: when users perform a pointing action, the system only considers the graphical ob-ject on which this action ends, ignoring its traob-jectory or speed. On the opposite, gesture-based interfaces associate movements with controls, allowing users to ex-press a message to the system by executing a specific movement. This latter type of interface carries either a part or all of the information in users’ movements, that feature varying shapes and dynamics.

Designing interfaces that are metaphors of the physical world is common in HCI. If we think about using the types of languages described above for communi-cating in the real world, we quickly understand what their potential strengths and limits are. Using a point-based paradigm for communicating would require peo-ple to convey concepts and designate things by reaching different objects one after the other. Giving people a rich expressive power thus means that the environment should feature a large number of objects. Also, to make people able to express themselves efficiently, the objects should be easy to reach. However, in an envi-ronment that contains many objects, those objects can be potentially very far and/or very small. A gesture-based paradigm does not rely on objects, but rather corre-sponds to adopting a sign language. The expressive power depends on the number of different signs and how they can express varying things and concepts. Of course, the signs should be easy to memorize and perform to offer an efficient means of expression. However, offering a large set of signs that has enough variability usu-ally requires considering complex signs, implying that learning and manipulating them will require a lot of cognitive and motor resources.

(11)

10 INTRODUCTION

What emerges is a tension between two types of limited resources: physical space, and human cognitive and motor abilities. Point-based interfaces heavily rely on recognition. Their expressive power is a function of the number of graphical primitives that the system can display, and their usability depends on the difficulty to point at each graphical primitive. Gesture-based interfaces rather resort to a recall paradigm. Their expressive power is a function of the different movements in the vocabulary of the system, and their usability depends on the difficulty to learn and perform each movement. Going from one type of interface to the other can be seen as transferring the expressive power from the display to users’ hands.

This display-hands opposition between point- and gesture-based interaction is, of course, a simplistic characterization of interactions that rely on these paradigms. For example, a swiping gesture for deleting an object on a tactile screen relies on both display and hand. However, this display-hand opposition remains interesting to identify fundamental research questions about the two interaction channels in-dependently. My research for the past ten years has been driven by such questions: how can direct manipulation scale to large graphical scenes that are too large to fit on the display? how can users manipulate very small graphical objects? are users able to learn and perform large sets of gestures? how can we offer a high power of expression without resorting to complex gestures? I believe that addressing these fundamental questions is crucial for designing efficient interactions in today’s vir-tual environments that can feature very small to very large displays, and that can capture a large variety of user movements through multiple sensors.

1.1 Point-based Interfaces

As mentioned above, point-based interfaces may have to show a very large amount of graphical objects on the display. Zoomable interfaces (also called Multi-scale interfaces) have this capability as they can present a graphical scene that can be far larger than the display viewport (such as in, e.g., Google Maps). To visu-alize a given area, users usually resort to navigation techniques such as traditional Pan&Zoom. However, when zoomed-in on a given region, users may miss impor-tant information from the surrounding context [JF98]. Focus+Context techniques such as fisheye lenses [CM01] offer an alternative by providing in-place magnifi-cation of a region without requiring users to zoom the whole representation. How-ever, adoption of Focus+Context techniques may be hindered by both perceptual and motor issues when transitioning between focus and context. These usability problems have driven several of the research projects I have worked on.

We have first investigated how to design transitions that are more efficient than those that are solely based on spatial deformation by rather relying on dynamic be-havior and translucence. We have proposed a design space for such transitions that we use for creating new lenses, called Sigma lenses [19, 21]. Our empirical study

(12)

Point-based Interfaces 11

showed that some of these new lenses outperform traditional magnification lenses for focusing on a given area. While well-designed lenses offer a good solution for focusing on objects, they can still suffer from usability issues when interact-ing with those objects. The typical implementation of Focus+Context techniques makes two representations of the data exist simultaneously at two different scales, with the focus region’s location associated with the pointer, meaning that a one-pixel displacement may make the pointer jump by several one-pixels in the magnified (focus) region. This quantization problem, i.e., the mismatch between visual and motor precision in the magnified region, gets worse with increasing zoom factors, thus limiting the range of applications that could offer magnifying lenses. We have studied this quantization problem and introduced new interaction techniques for selecting with a high-precision while preserving fast navigation performance [9]. Quantization is also partly responsible for the difficulty to follow a route with a magnifying lens, the route having a tendency to “slip off” the side of the lens. We have also designed lenses to make these steering tasks easier [1]. Our RouteLenses automatically adjust their position based on the geometry of a route making users able to comfortably navigate along paths of interest.

Most point-based interfaces allow users not only to select but also move ele-ments through drag-and-drop actions according to the principles of direct manip-ulation [Shn87]. For example, users pan the view in multi-scale interfaces, they move and edit the geometry of elements in graphics editors, they adjust parameters using controllers such as sliders, or they move and resize windows. While direct manipulation stipulates that actions should be easily reversible, reverting changes made via a drag-and-drop usually entails performing the reciprocal drag-and-drop action. This can be costly, as users have to remember the previous position of the object and put it back precisely where it was. We have worked on identifying the inconsistencies that exist between the different situations where users perform drag-and-drop actions, and on proposing a unifying model, DND−1, that allows users to easily undo and redo drag-and-drop actions in any situation [11]. Our Dwell-and-Spring widget [10] allows users to interact with this model in order to restore any past location of an individual object or of a group of objects.

The overall goal of these research projects is to push direct manipulation to its maximum expressive power by making users able to reach any graphical object and manipulate it. The related publications result from collaborative work with Emmanuel Pietriga, Olivier Chapuis, Olivier Bau (who was PhD student at that time), and two master students Jessalyn Alvina and Mar´ıa Jes´us Lobo (now PhD students). Chapter 2 details these different projects.

(13)

12 INTRODUCTION

1.2 Gesture-based Interfaces

Gesture-based interaction consists in associating a given human gesture with a command in the application. The HCI literature proposes different types of gestures that involve users’ hands at different granularities, ranging from

micro-movements of finger tips [RLG09] to whole-arm micro-movements [NWP+_{11]. Some}

systems also rely on mid-air gestures (e.g., [BBL93]) while others propose ges-tures that take their meaning relative to a device (e.g., [BIH08]) or to a surface (e.g., [WMW09]). Other dimensions can be identified to structure the potentially infinite design space of gestures. Proposing taxonomies for the use of gestures for inter-action has actually retained HCI researchers’ attention (e.g., [Ks05, WMW09]). I worked on such a taxonomy for the specific family of stroke gestures in collabora-tion with Shumin Zhai et al. [26]. In our integrative review, we discuss the use of stroke gestures along cognitive aspects such as discovery and memorization, and also point at the difficulties of developing robust gesture recognizers. My projects on gesture-based interaction address both system and user aspects: engineering so-lutions for integrating gestures that are robustly recognized by the system on the one hand, and defining novel vocabularies of gestures that remain simple for users to memorize and perform on the other hand.

I started to work on recognition engines when I was a post-doc in 2007-2008 at IBM Almaden. I designed and developed a toolkit to implement stroke short-cuts in software applications with only a few lines of code, with the motivation that stroke shortcuts should not be more costly to implement than keyboard short-cuts are [13]. This first project was more focused on integrating gestures within traditional graphical interfaces. In terms of recognition engine, it was relying on a template-based algorithm that runs once the gesture is complete. Since then, I have worked on incremental gesture recognizers (i.e., recognizing gestures dur-ing their execution as opposed to after their execution). I strongly believe that we should move towards this type of recognizer for two main reasons. First, it enables the development of interaction techniques that can guide users during ges-ture execution, supporting users in the discovery and learning of gesges-tures. Second, with well-designed gesture vocabularies, incremental recognition enables transi-tions between different gestures for smoothly chaining command invocatransi-tions and parameter adjustments. I have worked on designing incremental recognition en-gines for both single point and multi-touch input. In [3], we present an algorithm for estimating the scale of any partial single point input in the context of a gesture recognition system. We show how it can be used as a support for implementing OctoPocus [BM08], a visual guide that displays all available gestures in response to partial input. More recently, I have developed an incremental recognizer for a large vocabulary of multi-touch gestures that relies only on the last events in the finger input stream, meaning that users can switch between different gestures with-out resorting to explicit delimiters [17]. These gestures can thus be used to activate discrete commands as well as to adjust values of continuous parameters.

(14)

Gesture-based Interfaces 13

In 2011-2014, I coordinated an ANR JCJC project (MDGEST), whose core

idea consisted of designing large vocabularies of gestures for small tactile surfaces (smartphones and tablets) that remain easy to memorize and perform. This had to be achieved by using gestures that remain simple in their shape by rather rely-ing on other characteristics. Within the context of MDGEST, we have designed several novel vocabularies of gestures that rely, e.g., on additional input channels: tilt [25], pressure [24] and proximity [23]. These vocabularies of gestures remain simple to perform while offering at least as much expressivity as the whole set of existing graphical widgets, and without consuming any screen space. For all these projects, we have designed a set of gesture primitives and we have implemented the associated recognizer either on a regular smartphone using built-in sensors (e.g., accelerometers and gyroscopes) or on a smartphone that we equipped with some extra sensors (pressure or proximity) while taking care of preserving the device’s initial form factor. During the last year of the project, we also studied how to aug-ment the expressivity of multi-touch gestures, which we believe are not used to their full potential. In particular, we have shown that the system can analyze how fingers are positioned relative to each other to infer some user intentions at touch time, i.e., before users actually perform the manipulation [18]. For multi-touch input on regular tablets, we have also proposed a vocabulary of gestures that vary along high-level dimensions (such as fingers’ movements relative to one another, or the whole gesture’s frame of reference) in order to offer a rich power of expression to users, while only relying on simple circular and linear shapes [17].

My research on gestures, presented in Chapter 3, aims at limiting the complexity of the vocabulary in order to offer a high power of expression without increasing the cost of learning and using the language. Users can actually perform an infinite number of different gestures but, to be useful in an interactive system, the gestures must remain easy for users to recall, and recognizable by the system. My publi-cations on gesture-based interaction result from collaborative work with two post-docs (Daniel Spelmezan and Halla Olafsdottir), three permanent researchers from my research team (Emmanuel Pietriga, Olivier Chapuis and Theophanis Tsandi-las), as well as four colleagues from all over the world (Shumin Zhai (Google, USA), Per Ola Kristensson (University of St Andrews, Scotland), Tue Haste An-dersen (University of Copenhagen, Denmark), and Xiang Cao (Microsoft Research Asia, China)).

(15)

(16)

(17)

(18)

Pivotal to direct manipulation is the ability to select and move objects in a graph-ical scene. This usually implies view navigation to adjust the display viewport, target acquisition to grab objects, and potential movements of these objects in the scene. This chapter presents my projects along these three fundamental in-teraction components: navigation (which I already started to investigate during my PhD [12, 20]), acquisition and movement.

As mentioned in the introduction, multi-scale interfaces can accommodate a large graphical scene featuring numerous objects in a limited display space. In such interfaces, users can pan in the 2D plane as well as move in altitude so as to either get an overview or visualize details [CKB09]. This interaction scheme has become very widespread in today’s interfaces, and is especially useful to ac-commodate rich applications on portable devices. However, without appropriate navigation techniques, even simple tasks such as inspecting a local region or fol-lowing a path can quickly become cumbersome.

Once users have navigated in the graphical scene to bring objects of interest in the viewport, they must be able to designate them with their pointing device to actually select them. Fitts’ law [Fit54] accurately models this task and its asso-ciated difficulty in electronic worlds when the representation and the locomotion are similar to what we do in the physical world. However, by making two scales coexist at the same altitude, focus+context representations do not resemble the real world. The simultaneous existence of two scales introduces the quantization prob-lem, i.e., the mismatch between visual and motor precision in the magnified region, and forces us to reconsider what we know regarding target acquisition tasks.

Finally, according to the principles of direct manipulation, a lot of point-based interfaces also require users to move objects through drag-and-drop actions. Being able to change objects’ location increases the expressive power of those interfaces but it also introduces some complexity related to undoing such moves, when, e.g., repairing errors or when exploring different solutions. While a target acquisition (or selection) can usually be easily undone by designating the background, putting back an object where it exactly was is much more difficult.

This chapter presents our work about the different focus+context interaction techniques that we designed to offer an efficient means for navigating multi-scale interfaces and acquiring graphical objects. It then shows how our DND−1 model tackles the problem of reverting movements of objects performed using direct ma-nipulation.

(19)

18 DIRECT MANIPULATION

(a) (b) (c)

Figure 2.1 : Various transitions between focus and context: (a) step transition causing occlusion (MAGNIFYINGGLASS), (b) distorting space (FISHEYE), (c) using gradually increasing translucence (BLENDING).

2.1 View Navigation

Typical pan&zoom techniques are based on a navigation scheme that imposes a sequence of zoom operations (typically performed using the mouse wheel or pinch gestures) and pan operations (usually performed using mouse drags or finger slides) [GBLB+_{04]. Using pan&zoom, reaching an object that is not visible in the current} viewport requires changing the whole display’s content, which may be cognitively demanding [CKB09]. Focus+Context techniques offer an alternative by providing in-place magnification of a region without requiring users to zoom into the repre-sentation. These techniques have been shown to be useful for navigating complex visual representations such as large trees [LRP95, MGT+_{03], graphs [GKN05],} high-resolution bitmap representations [CLP04], and even graphical user interfaces featuring small controls [RCBBL07]. A focus+context representation allows users to concurrently preserve the context that the display offers and navigate at a zoom factor that is higher than that of the display. Contextual information can guide nav-igation when, e.g., looking for particular localities in a map of a densely populated region, or when exploring the points of interest along an itinerary.

However, magnifying in place also introduces a transition area that can hin-der the performance of focus+context techniques. For instance, simple magnify-ing glasses (Figure 2.1-a) create occlusion of the immediate context adjacent to the magnified region [RM93]; graphical fisheyes [SB94], also known as distortion lenses (Figure 2.1-b), make it challenging for users both to acquire targets [Gut02] and to follow trajectories. This section presents our extensions to Carpendale’s framework for unifying presentation space [CM01], providing interface designers with novel types of magnifying lenses that facilitate focus targeting [19, 21] and path following [1] tasks.

(20)

View Navigation 19 Transition Context

R

_O

R

_I Focus

Figure 2.2 : Gaussian distortion lens. The level of detail in the flat-top is increased by a factor of MM = 4.0

2.1.1 Sigma Lens

In [19], we introduce the Sigma Lens framework that defines transitions be-tween focus and context as a combination of dynamic scaling and compositing functions. This framework opens a design space to create a variety of lenses that use transformations other than spatial distortion to achieve smooth transitions be-tween focus and context, and whose properties adapt to the users’ actions. We identify lenses in this space that facilitate the task that consists in acquiring an ob-ject (focus targeting) and potentially exploring its surroundings (local navigation). The framework

All constrained magnification lenses featuring a regular shape share the follow-ing general properties, no matter how they transition between focus and context (see Figure 2.2):

· RI : the radius of the focus region (a.k.a the flat-top), which we call inner radius,

· RO: the radius of the lens at its base, i.e., its extent, which we call outer radius,

· MM : the magnification factor in the flat-top.

Applying a constrained lens to a representation effectively splits the viewing window into two regions: the context region, which corresponds to the part of the representation that is not affected by the lens, and the lens region, in which the representation is transformed. Since we want the lens to actually provide a more detailed representation of objects in the magnified region, and not merely duplicate pixels from the previous rendering, our framework relies on two buffers of pixels: the context buffer, whose dimensions w ×h match that of the final viewing window displayed to the user, and the lens buffer of dimensions 2 ·MM·RO× 2 · MM·RO.

(21)

In our approach, the overall process consists in applying a displacement function to all pixels in the lens buffer that fall into the transition zone: pixels between RI and MM · RO get scaled according to the drop-off function in such a way that they eventually all fit between RI and RO. Pixels of the lens buffer can then be composited with those of the context buffer that fall into the lens region.

Scaling. The standard transformation performed by graphical fisheyes consists in displacing all points in the focus buffer to achieve a smooth transition between focus and context through spatial distortion. This type of transformation can be defined through a drop-off function which models the magnification profile of the lens. The drop-off function is defined as:

Gscale: d7→ s

where d is the distance from the center of the lens and s is a scaling factor. Distance dis obtained from an arbitrary distance functionD. A Gaussian-like profile is often used to define drop-off functionGscale, as it provides one of the smoothest visual transitions between focus and context (see Figure 2.2). It can be replaced by other functions (see [CM01, CLP04]).

Compositing. The rendering of a point (x, y) in the final viewing window is controlled by functionR below, where

plens ⊗α pcontext

denotes the pixel resulting from alpha blending a pixel from the lens buffer and an-other from the context buffer with an alpha value of α. As with scale for distortion lenses, the alpha blending gradient can be defined by a drop-off function that maps a translucence level to a point (x, y) located at a distance d from the lens center:

Gcomp : d7→ α

where α is an alpha blending value in [0, αF T], αF T being the translucence level used in the flat-top of the lens.

R(x, y) =                ∀(x, y)|D(x, y) 6 RI, (xc+x−x_{M M}c, yc+y_{M M}−yc) NαF T (x, y)

∀(x, y)|RI <D(x, y) < RO, (xc+_G_scalex−x₍_D(x,y))c , yc+_G_scaley−y₍_D(x,y))c ) N_G_comp(D(x,y)) (x, y)

(22)

View Navigation 21 time α_FT lens speed (px/s) (t1) (t2) (t3) (t4) (t5) 1.0 0 0 t1 t2 t3 t4 t5 S 0.5

Figure 2.3 :SPEED-COUPLEDBLENDINGlens moving from left to right, withS(t) implemented as a low-pass filter.

RI RO RI RO 1.0 1.0 MM 0 0 d d scale α αFT

Figure 2.4 : HOVERINGLens

Speed-coupling. In addition to the transition functions Gscale and Gcomp, the Sigma Lens framework allows for lens properties such as magnification factor, ra-dius or flat-top opacity to vary over time. The first example of lens to make use of dynamic properties was Gutwin’s SPEED-COUPLED FLATTENING lens [Gut02], which uses the lens’ dynamics (velocity and acceleration) to automatically control magnification. By canceling distortion during focus targeting, SPEED-COUPLED FLATTENINGlenses improve the usability of distortion lenses. Basically, MM de-creases toward 1.0 as the speed of the lens (operated by the user) inde-creases, there-fore flattening the lens into the context, and increases back to its original value as the lens comes to a full stop. Such behavior can easily be implemented by defining a time-based functionS(t) that returns a numerical value depending on the veloc-ity and acceleration of the lens over time. The function is set to return a real value in [0.0, 1.0]. Making a lens parameter speed-dependent is then easily achieved by simply multiplying that parameter by the value ofS(t).

Instantiating Specific Lenses

In our approach to the implementation of the Sigma Lens framework, various constrained lenses are obtained easily, only by defining functions Gscale, Gcomp, andS(t). We have implemented some examples of transitions with static lenses that

(23)

rely on scaling (Figure 2.1-b) or on compositing (Figure 2.1-c), and with dynamic lenses that implement a speed-dependent behavior in terms of scaling (such as Gutwin’sSPEED-COUPLEDFLATTENINGlens) or compositing (ourSPEED-COUPLED BLENDINGlens illustrated in Figure 2.3). We have also implemented more complex transitions as with, e.g., theHOVERINGlens (Figure 2.4) that relies on both scaling and compositing, with a dynamic behavior for both its transparency level and flat top size. We conducted a series of experiments to assess the pros and cons of the different dimensions that the Sigma Lens framework features, as detailed next. 2.1.2 Focus Targeting

We first compared the focus targeting performance and limits of the five fol-lowing lenses: a plain MAGNIFYING GLASS, a simple distortion lens (FISHEYE), and BLENDING, SPEED-COUPLED FLATTENING, SPEED-COUPLED BLENDING. We considered five different magnification factors (MM). Higher magnification fac-tors make the task increasingly difficult: (i) the transition area becomes harder to understand as it must integrate a larger part of the world in the same rendering area, and (ii) it becomes harder to precisely position the target in the flat-top of the lens, the latter being controlled in the motor space of the context window. To test the lim-its of each lens, we included factors up to 14x. Our experiment was a 5 × 5 within-participant design: each within-participant had to perform several trials using each of the five lenses with five different magnification factors (MM ∈ {2, 4, 6, 10, 14}). A trial in our experiment consisted of a series of focus targeting tasks in every di-rection, as recommended by the ISO9241-9 standard [ISO00]. All details about our experimental design and statistical analyses are reported in [19]. Figure 2.5 summarizes our main results.

Interestingly,FISHEYEandBLENDINGdo not significantly differ in their perfor-mance. We initially thought that translucence could improve user performance by eliminating the drawbacks of space-based transitions. Transitioning through space indeed introduces distortion that makes objects move away from the approach-ing lens focus before movapproach-ing toward it very fast, makapproach-ing focus targetapproach-ing difficult [Gut02]. ButBLENDINGdoes not overcome this problem, as it introduces a new one: the high cognitive effort required to comprehend transitions based on gradu-ally increasing translucence which, as opposed to distortion-based transitions, do not rely on a familiar physical metaphor.

We expected speed-based lenses (SPEED-COUPLED FLATTENING and SPEED -COUPLEDBLENDING) to outperform their static versions (FISHEYE andMAGNIFY -INGGLASS). Each focus targeting task can be divided into two phases: in the first phase, the user moves the lens quickly to reach the target’s vicinity, while in the second phase, she moves it slowly to precisely position the target in the focus. In the first phase, the user is not interested in, and can actually be distracted by, in-formation provided in the focus region since she is trying to reach a distant object in the context as quick as possible. By smoothly and automatically neutralizing the focus and transition regions during this phase, and then restoring them,

(24)

speed-View Navigation 23 0 1000 2000 3000 4000 6000 7000 5000 Mean Time (in ms) Technique

SCB: SPEED-COUPLED BLENDING

SCF: SPEED-COUPLED FLATTENING

FL: FISHEYE BL: BLENDING MG: MAGNIFYING GLASS FOCUS TARGETING SCB SCF FL BL MG SCB SCF FL BL MG SCB SCF FL BL MG SCB SCF FL BL MG SCB SCF FL BL MG 2 4 6 10 14

Figure 2.5 : Mean completion time per T echnique × MM condition. based lenses should help the user. Our results did actually support that this is the case for SPEED-COUPLEDBLENDINGandMAGNIFYINGGLASS: smoothly neutral-izing and restoring the focus of a MAGNIFYING GLASS by making it translucent does improve performance. However our participants were not significantly faster with SPEED-COUPLED FLATTENING than with FISHEYE. This was especially sur-prising since the study conducted in [Gut02] showed a significant improvement in users performance withSPEED-COUPLEDFLATTENING. We think this inconsistency is probably due to implementation differences: we implementedSPEED-COUPLED FLATTENING as a constrained lens while it was implemented as a full-screen lens by Gutwin. In full-screen lenses, distortion affects the whole representation, which thus benefits more from the neutralization effect than constrained lenses that only affect a limited area.

2.1.3 Local Navigation

We then further investigated the performance of the two dynamic lenses,SPEED -COUPLEDBLENDINGandSPEED-COUPLEDFLATTENING, by considering a more re-alistic task where (1) the graphical scene is more complex, and thus potentially causes legibility issues when using distortion or transparency, and (2) the object to explore does not fully fit into the lens’ flat top, forcing local navigation, which may be difficult for users to perform with lenses that dynamically change. We conducted two experiments illustrated in Figure 2.6 based on two different types of represen-tation: a network (vector graphics) for Experiment Expgraph(Bg = graph), and a high-resolution satellite map (bitmap) for Experiment Expmap (Bg = map). In both cases, participants are instructed to memorize a word as they will have to

(25)

Figure 2.6 : Local navigation tasks in (a) a graph (labels displayed in black) and (b) in a map (labels displayed in yellow over a black background).

search for it in the representation. Once this target word is memorized, partici-pants put the cursor on a red square (20 × 20 pixels) located at the center of the screen and press the space bar to start the trial. Words (including distractor words) appear successively in the same locations as the circular targets did in our focus targeting experiment discussed earlier. However, here, a word can never be fully displayed in the flat-top, forcing participants to perform local navigation. When they recognize the target word, participants press the space bar. We count an error if they press the space bar while the lens is over a distractor word. Here again, to compare lenses both in usual and extreme conditions, we use two magnification factors (MM ∈ {8, 12}). Font size is set to 42 pts (at context scale) for MM = 8 and 28 pts for MM = 12, so that the lens’ flat-top can display at most 6 letters at full magnification. We use two word lengths to test the effect of the amount of local navigation on lens performance (LabLength ∈ {8, 12}). Finally, we con-sider two levels of Opacity as we were hypothesizing that background and focus might be perceptually interpreted as one illegible image if contrast is not strong enough when making use of translucence. Opacity was not included as a factor in Experiment Expgraphbecause sharp edges displayed on a uniform background are strongly contrasted.

Our results revealed that, in terms of completion time, participants were faster using SPEED-COUPLED BLENDINGthan SPEED-COUPLED FLATTENING. However, this difference was not statistically significant. Differences in accuracy were stronger, with participants being more accurate usingSPEED-COUPLEDBLENDINGthanSPEED -COUPLEDFLATTENING. Furthermore, differences between lenses in terms of accu-racy increased with the magnification factor. In addition, lenses seem to be un-equally affected by word length, the comparative gain ofSPEED-COUPLEDBLEND -ING over SPEED-COUPLED FLATTENING regarding accuracy is greater for longer words, tending to show thatSPEED-COUPLEDBLENDINGbetter supports local nav-igation thanSPEED-COUPLEDFLATTENINGdoes. This latter effect, observed only in Experiment Expmap, reinforces our intuition that lens usability is affected by the type of representation. We also observed thatSPEED-COUPLEDFLATTENINGis

(26)

View Navigation 25

(a) (b)

Figure 2.7 : Following an itinerary. (a) Conventional lens: the user overshoots at a right turn in Harrisburg; losing the route that falls in the distorted region. (b) RouteLens: the route’s attraction compensates the overshoot; the lens remains closer to the route, which remains in focus.

more penalized by background type than SPEED-COUPLEDBLENDING. While we were expecting usability problems due to the use of transparency especially with complex representations such as maps,SPEED-COUPLEDFLATTENINGwas actually more affected by the background type thanSPEED-COUPLED BLENDINGwas. We were even more surprised to observe that participants were more strongly affected by label opacity with SPEED-COUPLED FLATTENING than with SPEED-COUPLED BLENDING. As a summary, speed-coupled translucence does not have a negative impact on local navigation in our experiment. TheSPEED-COUPLEDBLENDINGlens then appears as a very efficient technique for navigating even complex scenes that feature a low level of contrast between elements.

2.1.4 Path Following

Speed-based behaviors rely on the hypothesis that users do not seek informa-tion at a detailed level when moving a magnifying lens. While this is typically the case in focus targeting tasks when users want to reach a distant graphical object as fast as possible, this hypothesis does not hold for path following tasks such as when inspecting an itinerary. Focus+context techniques are conceptually well-suited to inspect itineraries by allowing users to see the entire route at once, and perform magnified steering [GS03] to navigate along the path and explore locally-bounded regions of interest. Navigation based on magnified steering has been shown to out-perform regular pan&zoom for large steering tasks [GS03]. Yet, this task remains a challenging one for users, in part because paths have a tendency to “slip off” the side of the lens.

In order to make it easier for users to follow a route, we have designed Route-Lens, a new content-aware technique that automatically adjusts the lens’ position based on the geometry of the path that users steer through, so as to keep the lens on track in case of overshoot (Figure 2.7). RouteLens makes it easier for users to

(27)

follow a route, yet do not constrain movements too strictly. The lens is more or less strongly attracted to the path depending on its distance to it, and users remain free to move the lens away from it to explore more distant areas. RouteLenses decouple the lens’ position from the cursor’s position to give users the impression that the lens is attracted by the route. This separation between the motor and the visual space is similar to what Semantic Pointing [BGBL04] does when enlarging targets of interest only in motor space while leaving their visual counterparts unchanged.

When using RouteLens, all route segments whose distance to the system cursor is less than ∆ apply an attraction force to the lens. The lens’ position L is computed as a function of the system cursor’s position C by using a weighted mean between all attracting route segments:

L = C + dmin·

Pn

i=1wi· Ai

Pn

i=1wi

where Aiis the force vector that route segment i applies at position C to attract the lens (see below) and dmin is the distance between the cursor and the closest route segment.

To ensure continuous lens movements when a route segment starts or stops having an influence on the lens, wi is set to ∆ − dc,i, where dc,i is the distance between the cursor and route segment i.

For a given route segment, the attraction vector is computed as: A = α(dc)· (Rc− C)/dc

where Rcis the point on the route closest to the cursor, and dcthe distance between the cursor and the route segment. α is a power function of dcthat parameterizes the force vector a route segment applies to the lens:

α(dc) = 1₋ dc ∆ p if dc≤ ∆ 0 otherwise.

When steering along a magnified route, users want to minimize the distance dl between the lens’ center and the route. In Accot & Zhai’s steering law [AZ97], dl represents the movement’s variability along the tunnel centered on the route, i.e., the tunnel’s width. The law stipulates that the larger the variability, the easier the movement. Figure 2.8 shows how RouteLens makes steering easier than a regular fisheye lens does, by allowing for a wider variability in user-controlled cursor movements. To keep a regular lens at a distance dl from the route, users have to keep the cursor at a distance dc= dl. With a RouteLens, this distance can be larger: dc= dl+ dc· α(dc).

(28)

View Navigation 27 x + x + x + x + x + ∆ x

+ Cursor and Regular LensRouteLens

Figure 2.8 : Position of the cursor (grey line), and of the RouteLens (black line), that is vertically attracted (p = 2) by the route (bold blue line). The dashed black line shows the positions of the RouteLens when p = 6. In this figure, ∆ is equal to the lens’ flat-top diameter in motor space, and the black (resp. grey) circles show the part of the context displayed in the lens’ flat-top.

We ran a study comparing conventional fisheye lenses (RegularLens) with fish-eye lenses augmented with the attraction mechanism described above (RouteLens). While RouteLenses only affect the motor behavior of lenses, and can thus easily be combined with any type of graphical magnification lens [19][PPCP12], we con-sidered conventional fisheye lenses as a baseline both to isolate the benefits of the attraction mechanism and to avoid a too lengthy experiment. Our experimental task consisted in following a route with a lens, always keeping the route visible in the flat-top. The experiment was a 2×2×2×4 within-subjects design with factors: TECH, ANGLE, DISTRACTOR, and DIR. TECHwas the primary factor with two

values: RegularLens and RouteLens. ANGLEand DISTRACTOR were secondary

factors that defined characteristics of the route. ANGLE(Acute = π/4 and Obtuse = 3π/4) defined the angle between two route segments.DISTRACTORdefined the presence or absence of distractor routes, that also attract the cursor. WhenDIS -TRACTOR = With, additional (grey) routes were added at each turn of the (black) target route. DIRdefined the direction of steering: left-to-right, right-to-left, top-to-bottom, or bottom-to-top. This factor was introduced for ecological reasons.

As expected, RouteLens’ attraction effect made participants steer along the route with a movement that exhibits less variability. The average distance from the lens’ center to the route was significantly lower for RouteLens than for Reg-ularLens. The distance was 13.3 ±0.9 pixels for RouteLens and 33.3 ±1.7 pix-els for RegularLens (expressed with respect to the flat-top’s coordinate system). Also, RouteLens was significantly faster than RegularLens, a difference of ∼ 15%. Interestingly, the presence of distractors did not negatively affect RouteLens’ per-formance. We even observed a comparative improvement of completion time for RouteLens over RegularLens. This may be due to the specific route layout we con-sidered: a distractor route in the middle of the turn applies additional force vectors, resulting in a stronger global attraction towards the route at the end of the turn. Finally, qualitative feedback from participants revealed that participants hardly no-tice a difference between the two lenses, and that overall they express a preference for RouteLens.

(29)

2.2 Object Acquisition and Manipulation

Point-based interfaces rely on selections and movements of graphical objects. The Focus+Context techniques discussed above make it possible to implement graphical interfaces that feature a very large number of graphical objects and that users can still efficiently navigate. However, while navigation can be an end in, e.g., information visualization tasks, it is often only a means to make some objects visible before actually interacting with them through selection and drag-and-drop operations. This section presents our work about facilitating selection, and provid-ing a more flexible model for drag-and-drop interactions.

2.2.1 Acquiring objects with high-precision The quantization problem

Early implementations of magnification techniques only magnified the pixels of the context by duplicating them without adding more detail, thus severely lim-iting the range of useful magnification factors (up to4x). Newer implementations, in Carpendale’s original framework [CLP04] or in the Sigma Lens extension in-troduced earlier, do provide more detail as magnification increases. Theoretically, this means that any magnification factor can be applied, if relevant data is avail-able. In practice, this is not the case as another problem arises that gets worse as magnification increases: quantization.

Lenses are most often coupled with the cursor and centered on it. The cursor, and thus the lens, are operated at context scale. This allows for fast repositioning of the lens in the information space, since moving the input device by one unit makes the lens move by one pixel at context scale. However, this also implies that when moving the input device by one unit (dot), the representation in the magnified region is offset by MM pixels, where MM is the focus’ magnification factor. This means that only one pixel every MM pixels can fall below the cursor in the magnified region. In other words, some pixels are unreachable, as visual space has been enlarged in the focus region but motor space has not. Objects can thus be difficult or even impossible to select; even if their visual size is above what is usually considered a small target (less than 5 pixels). The square representing Arlington station in Figure 2.9-(Left) is 9-pixel wide, yet its motor size is only 1 pixel. Figure 2.9-(Right) illustrates this quantization problem with a space-scale diagram [FB95]: the center of the lens can only be located on a pixel in the focus window that is aligned – on the same ray in the space-scale diagram – with a pixel in the context window. The space-scale diagram shows that the problem gets worse as magnification increases.

High-precision Lenses

In [9], we introduce several strategies that decouple the cursor from the lens’ center in order to resolve the mismatch between visual and motor space precision

(30)

Object Acquisition and Manipulation 29 1px 12px 12px (c) (b) (a)

Map of the Boston area (source: OpenStreetMap.org)

context focus v u O s s•MM Possible location for cursor/lens center

Focus window

Context window

Figure 2.9 : Focus+Context techniques and the quantization problem. (Left) Mov-ing the lens by one unit of the input device South and East makes the cursor jump several pixels in the detailed representation (magnification factor MM = 12). (Right) Space-scale diagram of possible locations for lens center (each ray corresponds to one pixel in context space).

in the focus region. Our techniques’ design aims at making it possible to perform both fast navigation for focus targeting and high-precision selection in the focus region in a seamless manner.

· Key is a simple mode-switching technique. It uses two control modes: a context speed mode and a focus speed mode. It requires an additional input channel to perform the mode switch, for instance using a modifier key such asSHIFT. Users can then navigate large distances at context speed, where one input device unit is mapped to one context pixel, i.e., MM focus pixels, and perform precise adjustments at focus speed, where one input device unit corresponds to one focus pixel.

· Speed is inspired by techniques featuring speed-dependent properties (e.g., [CLP09, Gut02, IH00]. We map the precision of the lens control to the input device’s speed with a continuous function, relying on the assumption that a high speed is used to navigate large distances while a low speed is more char-acteristic of a precise adjustment (as observed for classical pointing [Bal04]). · Ring is inspired by Tracking menus [FKP+_{03]. With this technique, the} cursor can freely move within the flat-top at focus scale, thus enabling pixel-precise pointing in the magnified region. When the cursor comes into contact with the flat-top’s border, it pulls the lens at context speed, enabling fast repositioning of the lens in the information space.

A pointing task with a lens is typically divided in two main phases: (i) focus targeting, which consists in putting a given target inside the flat-top of the lens (Figure 2.10-(a) and (b)) and (ii) cursor pointing to precisely position the cursor over the target (Figure 2.10-(b) and (c)).

(31)

(a) (b) (c)

Figure 2.10 : Acquiring a target with a lens: focus targeting from (a) to (b) and, cursor pointing from (b) to (c).

The focus targeting task has an index of difficulty of about: IDF T = log2(1 +

Dc (WF Tc− Wc)

)

where WF Tcand Wcare the respective sizes of the flat-top and the target in context

pixels, and Dcis the distance to the target in context pixels as well1. This formula clearly shows that difficulty increases as distance increases, as the size of the flat-top decreases, and as the size of the target decreases. The size of the flat-flat-top in context pixels is directly related to the magnification factor of the lens, MM. In-deed, the size of the flat-top is fixed in terms of focus pixels, so the higher MM, the smaller the size of the magnified area in context pixels.

The final cursor pointing task mainly depends on the area of the target in focus space that intersects the flat-top after the focus targeting task. The larger this area, the easier the cursor pointing task. We can at least consider the best case, i.e., when the target is fully contained in the flat-top. In this case, the difficulty of the cursor pointing task can be assessed by the ratio Df

Wf where Df is the distance between

the cursor and the target, and Wf is the motor size of the target when magnified in the flat-top. The distance Df is small, i.e., smaller than the flat-top’s diameter, so we assume that the difficulty of the cursor pointing task is mainly caused by the value of Wf. For regular lenses, the value of Wf is actually the size of the target at context scale because the target is only visually magnified. With our lenses, however, since pixel-precise selections are possible, Wf is the magnified size of the target (at focus scale).

We conducted two experiments that involve pointing tasks. In the first exper-iment, we considered tasks with an average level of difficulty in order to test whether any of our three techniques degrade performance when compared with regular lenses (Reg). In the second experiment, we asked participants to perform tasks with a very high level of difficulty, which involve targets smaller-than-a-pixel wide at context scale and which regular lenses do not support.

1_ID

F T is the exact index of difficulty when the target must be fully contained in the flat-top.

(32)

Object Acquisition and Manipulation 31

1 3 5

Wc (context target width)

Movement Time (ms) 0 1000 3000 1 3 5 0 1000 3000

Speed Key Ring Reg

4 8 MM (magnification) Movement Time (ms) 0 1000 3000 4 8 0 1000 3000

Speed Key Ring Reg

(a) (b)

Figure 2.11 : First experiment. (a) Movement time perTECH_×WC. (b) Movement time perTECH_×MM. The lower part of each bar represents focus targeting time, the upper part cursor pointing time.

Figure 2.11 illustrates the main results for our first experiment. As expected, regular lenses (Reg) performed worse than the three other techniques. This is likely because, as we explain above, the target’s motor size is in context pixels for Reg whereas it is in focus pixels for Key, Speed and Ring. The comparative differences between the three other techniques are quite small. The only significant difference is actually between Key and Ring, with Key being faster than Ring.

More interestingly, the interaction effect TECH_×MM on movement time sug-gests that Ring suffers more than the other techniques from an increasing magnifi-cation factor. A closer look reveals that the time for performing the focus targeting phase is proportionally longer for Ring. This is probably due to the cost of repair-ing overshoot errors durrepair-ing this phase: changes in direction are costly with Rrepair-ing since the user first has to move the cursor to the opposite side of the flat-top before being able to pull the lens in the opposite direction.

The interaction effect TECH_×WC on movement time is also interesting as it highlights that the differences really matter for small targets (WC= 1 andWC=3). Key, Speed and Ring are significantly faster than Reg only forWC=1 and WC=3. The difference is not significant forWC=5. In the latter case, only Speed is signifi-cantly faster than Reg. Moreover Ring is faster than Key forWC= 1, while Speed is not. These results suggest that Ring is particularly efficient for very small targets and that Speed is more appropriate for larger ones.

Our first experiment thus supports that, in comparison with regular lenses, our precision lenses improve user experience when pointing at small targets. Our sec-ond experiment aimed at assessing the comparative performance of those precision lenses in extreme cases that regular lenses cannot support: very small target sizes (less than one pixel in context scale) and high magnification factors. We used the same experiment set up, but discarded the Reg technique as it is not capable of achieving sub-pixel pointing tasks, and considered targets that have a size in focus (WF) ∈ {3, 5, 7}.

(33)

32 DIRECT MANIPULATION 8 12 MM (magnification) Movement Time (ms) 0 1000 3000 5000 8 12 0 1000 3000 5000

Speed Key Ring

Figure 2.12 : Movement time perTECH_×MM. The lower part of each bar repre-sents focus targeting time, the upper part cursor pointing time.

faster than Speed but only forMM=12 while these differences are not significant forMM=8. This large difference atMM=12 is due to a sharp increase of focus tar-geting time (FTT) for Speed. Comments from participants confirm that the speed dependent control of motor precision is too hard when the difference between con-text scale and focus scale is too high, resulting in abrupt transitions.

As in our first experiment, we observe that focus targeting performance of Ring degrades asMMincreases. However, good cursor pointing performance compen-sates for it, resulting in good overall task completion time. During the cursor pointing phase, Ring is stationary; only the cursor moves inside a static flat-top. This is not the case for Key and Speed for which high-precision cursor pointing is achieved through a combination of cursor movement and flat-top offset. As a result, the control-display gain is divided byMMfor Key and Speed, resulting in a loss of precision that makes the pointing task more difficult with those two latter techniques than with Ring.

To summarize, when pushed to extreme conditions, the Speed lens becomes significantly slower than the other precision lenses while Ring remains as fast as Key without requiring an additional input channel for mode switching.

Finally, we designed a family of four high-precision lenses by combining our mechanisms for solving the quantization problem with visual behaviors from the Sigma Lens framework. We chose the two Sigma lens visual designs that we ob-served as the most efficient ones (SPEED-COUPLEDBLENDING– abbreviated Blend, andSPEED-COUPLEDFLATTENING– abbreviated Flat), and we combined them with either speed-dependent motor precision (Speed) or cursor-in-flat-top motor preci-sion (Ring). Key was discarded because it proved awkward to combine explicit mode switching with speed-dependent visual properties.

Speed + Flat: this lens behaves like the original Speed design, except that the magnification factor decreases toward 1 as speed increases. The main advantage is that distortion no longer hinders focus targeting. Additionally, flattening provides indirect visual feedback about the lens’ precision in motor space: it operates in context space when flattened, in focus space when not flattened.

(34)

Object Acquisition and Manipulation 33

Ring + Flat: This lens behaves like the original Ring design, with the magni-fication factor varying as above. As a consequence, the flat-top shrinks to a much smaller size, thus making course corrections during focus targeting easier since the cursor is still restricted to that area. As above, distortion is canceled during focus targeting.

Ring + Blend: This distortion-free lens behaves like the original Ring design, except that the restricted area in which the cursor can evolve (the flat-top) is larger. As speed increases, the flat-top fades out, thus revealing the context during the focus targeting phase. An inner circle fades in, representing the region that will ac-tually be magnified in the flat-top if the lens stops moving. The cursor is restricted to that smaller area, making course corrections less costly.

Speed + Blend: This lens behaves like the original Speed design without any distortion. As above, the flat-top fades out as speed increases and fades back in as speed decreases. Again, the larger flat-top reduces the focus targeting task’s index of difficulty. In a way similar to Speed + Flat, blending provides indirect visual feedback about the lens’ precision in motor space: it operates in context space when transparent, in focus space when opaque.

In a final experiment comparing those four hybrid lenses to the static Ring and Speed designs from our first two experiments, our participants saw their pointing performance further improved by the visual designs from the Sigma Lens frame-work. This supports that the gains from our high-precision mechanisms can be combined with the gains from our advanced visual designs.

2.2.2 Moving objects

The projects presented above facilitate object acquisition, which is the most basic and frequent action that users perform in point-based interfaces. The comple-mentary key component to point-based interaction is object displacement, which is usually enabled through drag-and-drop actions in graphical applications both on desktop computers and touch-sensitive surfaces. Drag-and-drop is used to pan the view in multi-scale interfaces, to move and edit the geometry of elements in graph-ics editors, to adjust parameters using controllers such as sliders, or to move and resize windows.

Objects manipulated via drag-and-drop often have to be restored to one of their previous positions. For instance, a user will carefully lay out windows on his desk-top but will then temporarily move or resize one of them to access content hidden behind it, such as an icon or another window of lesser importance that was left in the background; he will then want to restore the foreground window to its ear-lier configuration. The reader of a document will scroll down to an appendix or check a reference, and will then want to come back to the section he was read-ing. Current systems do not enable users to easily restore windows or viewports to their earlier configuration; users have to manually reposition and resize the corre-sponding objects. Such actions can be costly. From a motor perspective, the cost of repairing a drag-and-drop manipulation can be higher than that of the original

(35)

Misc Docs

active areaPictures

Misc Docs

Pictures

Misc Docs

Pictures

Misc Pictures Docs

Figure 2.13 : The Dwell-and-Spring technique (DS). A red circular handle pops up close to the cursor when the user presses the mouse button and remains still for 500ms (i.e., dwells) over an icon. Releasing the mouse button while the cursor is over the spring handle will undo the last move of this icon.

manipulation depending on how precisely the object has to be positioned. This is especially true for touch-based interfaces, which can make precise manipulations challenging [SRC05]. The cost can also be high from a cognitive perspective, as users may have difficulty remembering what was the previous state of a particular object [KLDK08].

We studied typical situations where users have to perform a reciprocal drag-and-drop in order to restore objects to their past locations. We first designed the Dwell-and-Spring interaction technique [10] that allows users to undo movements of individual objects according to a simple linear undo model. We then introduced the DND−1 model [11] that can handle all past locations of individual objects and groups of objects, which led us to redesign Dwell-and-Spring in order to allow users to navigate objects’ histories and perform the reciprocal drag-and-drop action of interest.

Reciprocal Drag-and-drop: Simple cases

Situations that call for reciprocal drag-and-drop can be simple: for instance, putting a window back to its last location or reverting it to its previous size.

The basic Dwell-and-Spring technique, as described in [10], readily applies to all simple cases of reciprocal drag-and-drop. Figure 2.13 illustrates it on a very simple case, where an icon gets restored to its last position. A red circular handle pops up close to the cursor when the user presses the mouse button and remains still for 500ms (i.e., dwells) over the icon. Bringing the cursor or finger onto this handle will make a spring appear, showing what the center of the icon will become if the user releases the mouse button or lifts his finger over the spring handle. If the user dwells without having initiated any movement, the spring shows the last move that was applied to the icon. If the user has already initiated a drag-and-drop, the spring proposes the reciprocal drag-and-drop for the current move. The user can either move over the spring handle and select it, activating the spring and thus bringing back the object to its previous location; or he can discard the widget by getting out of the active area.

As illustrated in Figure 2.14, this version of Dwell-and-Spring supports various cases of reciprocal drag-and-drop: manipulating icons on the desktop, navigating

(36)

Object Acquisition and Manipulation 35 view scrolling value selection caret positioning window resizing vector scaling/rotating window stacking view panning

Figure 2.14 : Examples of frequent drag-and-drop actions that may call for recip-rocal drag-and-drop actions.

documents using a scrollbar or with a swipe gesture on a touch-sensitive surface, moving and resizing windows, or any other action where the spring’s actions are equivalent to what the user would manually do to revert to the original state, like moving a slider knob or a manipulation handle. In this original version, Dwell-and-Spring is only able to revert the current or the last drag-and-drop, as it is only keeping track of the previous location of each object, based on a per-object linear undo model.

We conducted an experiment to capture what users typically do in situations where they want to revert a drag-and-drop. We also wanted to evaluate whether the spring metaphor implemented in Dwell-and-Spring is a viable alternative or not. The experiment contained two parts: (1) an interactive in-situ questionnaire to gather data about users’ habits when reverting drag-and-drop actions in different contexts of use (view navigation, window management, vector graphics editing, etc), and (2) a formal experiment to evaluate how easy it is to discover and un-derstand Dwell-and-Spring, and how often users would actually resort to it once discovered.

First, our general observation about users’ habits was that they always repair their direct manipulation errors manually, except when the direct manipulation acts at the functional level of the corresponding application (e.g., moving a shape back where it was with the undo command in a vector-graphics editor). Second, when the environment proposes Dwell-and-Spring for reverting moves, one third of users spontaneously tried to make use of it. Demonstrating the technique even a single time was sufficient for users to understand and adopt it. Finally, our quantita-tive analysis highlighted the speed-accuracy trade-off of using Dwell-and-Spring:

(37)

(a) (b) (c) (d) (e)

(f) (g) (h) (i) (j)

Figure 2.15 : Exploring different office layout alternatives on a floor plan. (a) Placing a cupboard in the SW corner. (b) When moving the cupboard to the SE corner, it is difficult to access it when the door is open. (c) Cupboard back to the SW corner. (d) Cupboard in the NE corner. The heater is partially occluded. (e) Cupboard almost centered along the S wall. (f) Adding a desk in the NE corner, composed of two tables and a chair. The heater is partially occluded. (g) Cupboard back in the SE corner to free space for the desk in the SW corner. (h) Desk in the SW corner. (i) Changing the relative placement of the desk elements. (j) Desk back in the NE corner with the new relative layout between the two tables and the chair. while it may be a bit slower in some cases, Dwell-and-Spring accurately cancels or undoes any direct manipulation, which can be a significant advantage for precise positioning.

Reciprocal Drag-and-drop: Advanced cases

Situations calling for reciprocal drag-and-drop can be much more elaborate than simply restoring an object’s last location: for instance, putting back a group of shapes to an earlier position on the drawing canvas after having manipulated other shapes, while preserving the new relative position that was given to the shapes in the group after they were initially moved away. From a user perspective, such graphical layout tasks are often part of an exploratory process. For instance, Fig-ure 2.15 illustrates a scenario in which a person rearranges furnitFig-ure in an office and tests alternative layouts. The software allows her to explore different arrange-ments by selecting and moving either a single piece of furniture, or multiple pieces together. Direct manipulation strongly contributes to making such exploratory de-sign activities easy. But effectively supporting users also entails enabling them to easily revert back to past states from which to try other design options. Most graph-ical editing software provides an undo command to restore a past state of the entire document but, unfortunately, the underlying undo model is usually a global linear one that does not keep track of branches in the history of manipulations. Such a basic undo mechanism has two strong limitations, as detailed below.

(38)

inac-Object Acquisition and Manipulation 37 A B C D E F G stored history {A, B, C, D, E, F, G, H, I, J} 1 step back {-J} {-J, -I} {-J, -G} {-J, -G, -F} {-J, -G, -F, -E} {-J, -G, -F, -E, -D} 2 steps back 3 steps back 4 steps back 5 steps back 6 steps back presented history H I J {-J, -G, -A} 7 steps back

Figure 2.16 : DND−1stores all repositioning actions applied to an object, includ-ing those performed via a reciprocal drag-and-drop (D, F and I, shown as dashed black lines). It presents the shortest path to all past locations.

cessible [Ber94, YMK13]. In Figure 2.15, the user moves the cupboard (a-b) but then undoes this move (b-c) when she realizes that this location might not be so convenient because of its proximity to the door. Later, after having considered the different constraints (window, heater, additional furniture), she finally decides that putting the cupboard behind the door (as in (b)) is the best option. She wants to revert it to this location, but as she has moved it to other locations (c-d-e) after her undo operation (b-c), she is no longer able to get back to this configuration other than by manually moving it back there.

The second limitation comes from the lack of integration of object selection mechanisms with the history of direct manipulations. In Figure 2.15, the user moves the two tables and the chair that make her workstation (g-h), and then changes their relative layout, thus breaking the previous multiple selection (i). Be-cause there can only be one single active selection at a time, testing a location of the workstation that has already been explored (f), but with the new relative layout made in (i-j), requires selecting all its elements again and manually dragging-and-dropping them in the right place. Some graphical editors feature a command to group objects together. But this makes the exploratory design process much more cumbersome, as groupings have to be anticipated and created explicitly. In addi-tion, groupings set persistent links between objects, which impede single-object editing operations.

All Past Locations of an Object Applications that support undo typically store the history of actions as a tree whose nodes are the different states of the applica-tion. Performing an operation means adding a novel child state to the current node. Undoing an operation means getting back to the parent node. The linear undo model that most applications propose only supports one single active path. All nodes outside this path are inaccessible via undo. For instance, in Figure 2.16, the user moves the icon three times successively (displacements A, B then C), reverts C, and then moves the icon again by E. At this point, she can no longer recover the position the icon had after displacement C, since this one no longer belongs to the

From Direct manipulation to Gestures

HAL Id: tel-01557524

https://tel.archives-ouvertes.fr/tel-01557524

From Direct manipulation to Gestures

Caroline Appert

To cite this version:

HABILITATION `A DIRIGER DES RECHERCHES

Caroline Appert

From Direct manipulation to Gestures:

Moving the Expressive Power from the Displays to the Fingers

Acknowledgments

Table of Contents

Abstract

1.1 Point-based Interfaces

1.2 Gesture-based Interfaces

2.1 View Navigation

R

R

2.2 Object Acquisition and Manipulation