Editing reality made easy

(1)

Editing Reality Made Easy

by James Keat Hobin

S.B., C.S. M.I.T., 2016

Submitted to the

Department of Electrical Engineering and Computer Science in Partial Fulfillment of the Requirements for the Degree of

Master of Engineering in Electrical Engineering and Computer Science

at the

Massachusetts Institute of Technology

June 2017

The author hereby grants to M.I.T. permission to reproduce and to distribute publicly paper and electronic copies of this thesis document in whole and in part in any medium now known or hereafter created.

Author:

Department of Electrical Engineering and Computer Science May 26, 2017

Certified by:

Pattie Maes, Professor of Media, Arts, and Sciences. Thesis Supervisor May 26, 2017

Accepted by:

Christopher Terman, Chairman, Masters of Engineering Thesis Committee May 26, 2017

(2)

Editing Reality Made Easy

by James Keat Hobin

Submitted to the

Department of Electrical Engineering and Computer Science May 26, 2017

In Partial Fulfillment of the Requirements for the Degree of

Master of Engineering in Electrical Engineering and Computer Science

Abstract

The Reality Editor is a system based around the concept that physical objects may serve augmented reality web interfaces while communicating information about their state to the local environment. Its combination of an augmented reality web browser with an object messaging system leads to high usability and simplicity.

In my contribution to the project, I reduce the difficulty of creating and controlling compelling augmented reality Internet of Things experiences using the Reality Editor. I simplify the creation process through three areas of work: creating reusable user interface components, designing an in-browser editor, and implementing a drag-and-drop in-app editor. My work on enhancing control takes two forms: developing an object memory system and integrating networked cameras. Finally, I improve the Reality Editor as a whole through a series of efforts targeting its documentation, performance, and reliability.

(3)

1 Introduction

The Reality Editor is a general system for the control and coordination of connected devices. It accomplishes this goal through the use of four major components: the

decomposition of objects into values, connections between these values, augmented reality interfaces, and logic crafting.

Reality Editor System Diagram

The key concept behind the functionality of the Reality Editor is Bi-directional Augmented Reality. The user both consumes and controls their objects through augmented reality. This diverges from current work in Augmented Reality which usually focuses on consumption of information over control. In related work, I highlight tools that focus on the two sides of Bi-directional Augmented Reality. On one side, there are tools that create compelling augmented reality content for consumption by users. On the other, there are Internet of Things solutions for controlling connected devices.

The Editor is as a iOS app where the user interacts with objects seen through their camera. Listed from the top, the buttons on the side are the UI, node, pocket, settings, and freeze buttons. The UI and node buttons allow the user to switch between the User

(4)

Editor App User Interface

Interface View and Node View of their objects. These views are described in more detail in Subsection 1.3. The pocket button stores items for the user. The items relevant to my contribution are memories and user interface elements, covered in Section 8 and Section 7, respectively. The settings button opens a page where designers, developers, and other power users can control the internal workings of the Reality Editor. The freeze button pauses the updating of the app’s camera and tracking, which freezes the positions of objects on the screen. This prevents the accidental movement of objects which was identified as a major pain point for users who can have shaky hands, tap the phone a bit too forcefully, or even just tire from trying to hold the phone pointing steadily in one direction. Whether the user is in UI view or node view, the Reality Editor forwards the user’s actions to the object servers where they can change the physical world.

(5)

Decomposition of a LEGO Robot

1.1 Decomposing Objects Into Values

In the Reality Editor, each object begins as the sum of its parts. A light becomes a brightness value, a radio becomes a tuning knob, volume knob, and speaker output, or a robot becomes each of its component sensors and actuators. Through this decomposition, the system allows users to reason about the properties of their objects. Notably, the

(6)

representation as values is both simple and universal. Instead of learning a new abstraction for each manufacturer or for each object, the user need only grasp the concept that each object is equivalent to a collection of values.

1.2 Connections Between Objects’ Values

Sample Connection Between Button and Light

Once an object is decomposed into component values, the next step is for those components to interact. A light and a switch as individual, isolated components accomplish nothing. When the value of the switch is connected to the brightness of the light, it

becomes possible for a user to control the light with the switch.

These connections are simple unidirectional conversations where the value of one component dictates the value of another. However, their expressiveness grows exponentially with the number of available objects. While a single switch and light is approximately what an electrician could accomplish, the Reality Editor can create dynamically changing

(7)

interactions between an unlimited number of objects.

1.3 Augmented Reality Interfaces

The previous layers of the Reality Editor are its plumbing. Augmented reality interfaces are how the Reality Editor allows users to reroute and control this plumbing. This implements the overarching concept of bi-directional augmented reality by providing the opportunities for consumption of object information and control of object interfaces.

There are two main types of interfaces within the system. The first is the node and connection view which maps to rerouting the system’s pipes. The second is the user interfaces view which allows control of the objects directly.

Node View of a Light

The node view shows the structure of information flow through the user’s local system. By visualizing connections and the values flowing through them, the Reality Editor allows the user to see at a glance the behavior of their objects. They may also delete and

(8)

create connections by using the touchscreen to draw a line between any two nodes. In the above image, a user has drawn a connection between the button and brightness nodes of a light. This connection means that pressing the button on the light will turn it on and off.

User Interfaces View of the Same Light

The user interfaces view provides object-specific elements. This user interface

generally corresponds to the object’s components which support modification. For example, the light above exposes a slider element to control its brightness. The circular slider is a simple way to control the light without requiring an electrician to install a dimmer switch. The augmented reality nature of these interfaces provides the ability for creators to employ logical affordances in their designs. An example of this ability is an interface for the Nest thermostat echoing the rotational temperature control and reducing user confusion.

(9)

Logic Crafting Board

1.4 Logic Crafting

Direct connections between objects allow for simple interactions like light switches and remote speakers. However, these connections are direct linear relations. To provide more complex functionality the Reality Editor needs a method for introducing nonlinearity and abstraction. Logic Crafting is a way of assembling “logic,” modular blocks encoding behaviors, to create nonlinear interactions between objects.

2 Motivation: Editing Reality Made Easy

In previous work, the Reality Editor has had issues with approachability and legibility. On the other hand, related work in the fields of augmented reality and Internet of Things control tends to be user-friendly but pigeonholed into providing either passive consumption or unwieldy interaction. Studying previous incarnations of the Reality Editor and current offerings of commercial and academic groups allows me to learn from their decisions.

(10)

My contribution centers around lowering the difficulty of creating augmented reality Internet of Things experiences with the Reality Editor. I pursued this goal through five major areas of work. The first is the development of reusable components for creating augmented reality interfaces. The second is the creation of a user-friendly in-browser editor that could be used to develop object experiences without installation of any additional software. The third is a drag-and-drop in-app editor which was a final step towards simplifying the creation process. I then worked on the sources of friction for users

attempting to interact with their objects. This began with the design and creation of the object memory system, which provides a straightforward abstraction for remote control of objects. The memory system then led to my next avenue of exploration where I developed a method for integrating remote cameras to provide dynamic feedback during object teleoperation. Finally, I improved the documentation and reliability of the Reality Editor, decreasing new user frustration and confusion.

3 Previous Work

In its first iteration, the system that would become the Reality Editor was called Smarter Objects. This iteration had all of the basic features present in the current Reality Editor, but suffered from several design flaws that hampered its usability and readiness for change. The first flaw was the choice of language, C++. C++ excelled in creating a complex yet bug-free system, but added a high barrier to entry for potential collaborators. While many people can create impressive experiences by copy-pasting JavaScript code into web pages, attempting to apply the same approach in C++ generally leads to memory corruption and crashes. Despite the choice of language, Smarter Objects was a powerful and stable

implementation of the ideas that would become the Reality Editor. It successfully

showcased connections between objects, dynamic augmented reality interfaces, and basic node-based programming.

(11)

Smarter Objects System Diagram

Smarter Objects centralized the representation of connections between objects, simulating all of them on one server. This server would take the results of its simulation and tell each object in the system what state it should take. The server had complete control over the interactions between objects and could automatically adapt between data sources on the fly. For example, it could take a radio output connected to a light’s color and generate a reasonable visualization of the song’s beat. The server could also detect and simulate circular sets of connections by iterating until the values of the system reached a steady-state. It also possesses a lower bound on messages sent than the Reality Editor’s peer-to-peer design. Because the centralized server coordinates the entire system, it sends at most one message per object in the space. On the other hand, each object in the Reality Editor can send messages to each other object, sending a number of messages equal to the number of objects squared in the worst case. While there were benefits to the centrality of the Smarter Objects server, this centrality meant that if the server were shut down every

(12)

single connected object would cease to function.

Radio User Interface

A second difference was the way that interfaces were implemented. In Smarter Objects each interface was coded individually in C++ using OpenFrameworks. This

allowed a high degree of control and customizability. For example, the radio’s user interface involved a subtle 3D effect that is nearly impossible to recreate in the Reality Editor’s web-based approach. However, the process of creating interfaces in Smarter Objects took significantly longer, requiring lengthy recompiling steps, difficult debugging, and low-level graphics programming.

Another difference is the implementation of the memory system, which is covered in the later memory system section.

(13)

Programming in Smarter Objects

the system through programming. It did so through allowing users to drag and drop “virtual objects” into the space. These virtual objects implemented simple functions like delaying, switching, and scaling values. This allows the behavior of the system to be easily visible from a glance, unlike the Reality Editor’s logic crafting. On the other hand, having every node be visible becomes a significant drawback in larger systems. The key limitation to this approach was that it lacked any way of providing an order to input values. For example, a division block could never exist in Smarter Objects because it would never be clear which value should be the numerator and which the denominator. This approach also had the flaw that was the nail in the coffin for continuing development of Smarter Objects: it was written in C++ running off of a centralized server. There was no way forward to scale Smarter Objects into the extensible, accessible, and decentralized platform of the Reality Editor.

(14)

4 Related Work

There are two major areas of work related to the goals of my contribution to the Reality Editor. The first area is tools for designing experiences. These tools’ purposes range from content creation to Internet of Things app prototyping. Their goal of being simple to use unites them. My work on the Reality Editor shares this goal. The second area of work is tools for remote operation of connected devices. My contribution of the memory system is an attempt to pursue this goal.

4.1 Experience Design

The experience design tools each have slight differences of focus that make them relevant to my goals with the Reality Editor. Commercial solutions include the component-based but centralized Layar, the powerful game engine Unity, and the AR platform ZapWorks. There are also mature tools from academia in Georgia Tech’s Designer’s Augmented Reality Toolkit, their Scratch extension AR SPOT, and their AR browser Argon.

Layar _{Layar Widgets}

Layar is the classic corporate solution for creating AR experiences. It provides a WYSIWYG editor where users on a computer can drag-and-drop UI components to be associated with a visual marker (Blippar Group). One of these components is an embedded HTML component that allows the insertion of an arbitrary iframe into the designed space. This is surprisingly similar to how the Reality Editor implements its components. While

(15)

my contribution does away with Layar’s requirement for a computer, Layar does have some tricks up its sleeve. One killer feature is its ability to generate native app intents. While the Reality Editor supports every feature of the web, its users remain isolated within the Reality Editor app. A user of Layar who presses an app intent generating button will be directed to another app on their phone in a way similar to following a web link.

Architecturally, Layar is very centralized with all of the drawbacks of Smarter Objects’s design. Each interface is served from the Layar Server which forwards requests to a less-centralized set of web services. This centralization allows Layar to provide

brand-friendly analytics but means that if the Layar Server is down, every single connected experience is unavailable.

Augmented Reality Content Creation in Unity (Edgaras Art)

The second interesting corporation-backed tool is Unity. Unity began life as a game engine but can easily render AR scenes through libraries like Vuforia and ARToolkit (Unity Technologies). The key difference of Unity is that it uses a proprietary scene format. While

(16)

the Reality Editor produces HTML, Unity compiles to either a native app or generated JavaScript. Unity is unlike any other creation software surveyed and requires many hours of training before a user can create content with it. As a reward for mastery, Unity does provide a significant amount of control and power, but the time investment is much higher than working off of preexisting knowledge.

ZapWorks

The final proprietary tool surveyed is ZapWorks. Unlike Layar and Unity, ZapWorks is a complete AR platform including ZapWorks Designer, Studio, and Widgets (Zappar Ltd.). Designer is similar to Layar and does not provide any significant insights. Studio is also reminiscent of Unity and relatively uninteresting. ZapWorks Widgets is where this

(17)

work becomes valuable for consideration. Widgets enables a user of the system to select a group of UI components which are then automatically laid out around a marker. This is the one time in all of the related work where the studied solution has around the same number of steps required to create a compelling experience as the Reality Editor. However, there’s relatively little customization that the user can provide to the finished experience after deciding which widgets are present. A final characteristic of ZapWorks is that they require trackable images to include “ZapCodes”, a proprietary mark incorporating the ZapWorks logo. This limitation allows them to circumvent a classic problem in image-based augmented reality where multiple people may choose to augment the same type of object in colliding ways. However, it prevents ZapWorks from having the same flexibility as the other augmented reality tools.

Designer’s Augmented Reality Toolkit

(18)

the Designer’s Augmented Reality Toolkit, provides a unique take on the problems that Layar, Unity, and ZapWorks attempt to solve (MacIntyre 2004). Like those tools, DART offers a WYSIWYG interface, although DART’s is based on Macromedia Director instead of being created from scratch. DART offers standard drag-and-dropping of components as well as basic programmatic scripting of interactions. The most divergent part of DART’s design is the offering of basic property-triggered events. For example, moving the object further from the camera could trigger a change in the object’s augment reality interface’s color. The configuration of this event-based control is based on a few clicks in a graphical user interface, not advanced programming.

AR SPOT

AR SPOT is a similar extension-based tool for authoring augmented reality

(19)

programming functionality. Scratch’s simple style is designed from the ground up to be easy to learn, especially for children and adults without programming experience (Maloney 2004). A user can pick up Scratch quickly then easily prototype an AR scene by

incorporating the building blocks introduced by the researchers. AR SPOT also makes a novel contribution to the addition of controls into the space. Because it lacks any ability to connect to the physical world other than markers, it allows users to designate markers as rotatable knobs. While this is an extension of reading markers’ positions, it is much more streamlined than any approach the other tools take.

The same team at Georgia Tech produced the augmented reality browser Argon (MacIntyre 2015). Argon shares the Reality Editor’s basis on web standards, providing a way to turn any webpage into an augmented reality experience. They additionally integrate with three.js as well as standard browser geolocation, going beyond simple static 2d

content. The use of geolocation is unique to Argon and is a valuable insight into future integrations to add to the Reality Editor. However, Argon lacks in two areas where the Reality Editor excels. The first is that it is strictly designed for consumption of content. This is theoretically surmountable by a user with enough gumption, but Argon does not assist them. It’s also a non-factor if the user does not wish to influence their surroundings. The second factor is the more important. Argon does not provide any way for users to discover the available interfaces around them. Instead, a user can enter an experience by opening a webpage. This is a significant barrier to using Argon fluidly. The Reality Editor has been designed with approachability and discoverability at its heart. My contribution further improves the process of opening interfaces. By focusing on the differences in opinion between Argon and the Reality Editor, I can emphasize the Reality Editor’s strengths while shoring up its weaknesses.

All of these tools for creating experiences lack any interface to the physical world. They only allow the consumption of information through screens. This is one direction of the term “bi-directional augmented reality.” The next direction is tools that allow users to

(20)

control the physical world.

4.2 Remote Control

The second category of tools allows users to control the physical world. None of these tools afford an augmented reality interface but they nevertheless have novel characteristics whose study benefited my contribution to the Reality Editor. I consider three tools here: NOODL, Kasa, and Google Assistant. While the tools vary in implementation, they all allow the user to control their smart objects remotely, providing a contrast to my approach with the memory system.

NOODL

NOODL is a prototyping tool for creating experiences with connected devices (Topplab AB). It’s very similar to Layar, Unity, and other WYSIWYG design tools, but has a focus on controlling instead of augmenting reality. Users of NOODL can

drag-and-drop components into the space, then wire up interactions between them. For example, a smart watch color widget can be connected to a light to provide a simple

(21)

remote control of the light’s color. This connection is similar to the Reality Editor’s connections but requires the user to be actively engaging with the NOODL desktop app. The most compelling part of NOODL is that it allows the user to exert a high degree of control over the display of their controls, potentially allowing them to create an intuitive interaction like the Reality Editor’s memory system.

Kasa

Kasa is TP-Link’s app for controlling their Wi-Fi-connected smart light bulbs, plugs, and switches (TP-Link Technologies Co. Ltd.). It is a very standard app with a design comparable to nearly every other competitor in their space, e.g. Philips Hue. The user can setup and configure their devices while controlling the properties of each device on a one-by-one basis. For example, a light can be discovered, named, and switched on

through the Kasa app. In the context of my contribution, this app is largely an example of design patterns to avoid. A user of this app can easily forget the difference between

“Kitchen Light 1” and “Kitchen Light 2” when they want to turn on a specific light. They also cannot determine what the effects of turning on “Switch 1” might be. Finally, if the user wants to turn on a specific set of lights simultaneously, they must first log in to their

(22)

TP-Link account, select each light individually, then determine the status they want each light to have. This spotlights a problem at the heart of nearly every Internet of Things app: the reliance on a third party. Nearly every advanced function of the Kasa app requires the user’s data to pass through TP-Link. This is especially chilling for the remote control function, where TP-Link could easily determine the location and habits of the user through when they choose to operate the devices in their home.

A Conversation with Google Assistant

Google Assistant represents a different take on the centralized Internet of Things controller (Google). It still suffers from the same drawbacks of Kasa: the user must be logged in and share all of their data with Google. However, its interface is very different. Instead of poking through an app, the user can speak to the Assistant through somewhat natural language. For example, “Ok Google, turn off the living room lights” will correctly turn off all lights that the user decided were part of the living room. The quality of Google Assistant is determined by the degree to which the user’s setup fits into the Google’s framework. If the user has clearly delineated rooms and only ever wants to control those rooms or individual lights, Google Assistant will function perfectly. It breaks down when

(23)

the user forgets a room’s name or wants to precisely control a variety of lights not in the same room. This is symptomatic of the general drawbacks of systems that try to guess on behalf of the user. In my contribution to the Reality Editor and in the design of it as a whole, a concerted attempt is made to allow the user to decide which tasks they want to accomplish. The Reality Editor is like a Leatherman multi-tool when Google Assistant is like a shiny flathead screwdriver.

5 Reusable Components

One of the main benefits to the Reality Editor’s being based on web technology is that a prospective designer only needs to know HTML and JavaScript. However, programming is not a perfect tool for all people. The creation of experiences can be prohibitively difficult for those without degrees in computer science or significant training. To tame this difficulty curve I created a series of reusable components for the Reality Editor.

The components I created have a direct real-world equivalent: LEGO blocks. Anyone regardless of skill can assemble a large model in LEGO or a compelling interface with the reusable components. The implementation of the components was designed from the ground up with simplicity, consistency, and web standards.

The simplicity of the components is their first advantage over the lower level

JavaScript application programming interface (API). While the JavaScript API provides a large amount of functionality with an even higher amount of customizability, it requires a large amount of skill in programming. A way to indirectly measure the complexity of using the original interface is in the number of lines of code required to accomplish a given task. Using components over the existing JavaScript API reduces the total lines of code

significantly. This improvement is possible by optimizing for the most popular use cases as determined in workshops and other interactions with users. For example, a single

(24)

This component can then be integrated with a UI-specific component to create an automatically updated slider with direct feedback.

Baseline Code

Component-based Code

The second advantage of component-based design is consistency. In the same way that components involve optimization for popular use cases, they also provide consistency of design and implementation. The visual benefit is that all of the default components use the same aesthetic language as the Reality Editor itself. This allows users to create

interfaces that are intuitive by default. The benefit of consistent implementations is that every component shares the same underlying code. This well-tested and mostly bug-free code provides a smaller surface for users to accidentally produce incorrect behavior. Most uses of the components do not even need to incorporate custom code. By concentrating shared functionality into a single module, that module can be subjected to more intense scrutiny. Additionally, when the Reality Editor’s protocol or behavior changes, these

(25)

modifications only need to be reflected in one location instead of necessitating every user to rewrite their interfaces.

User Interface Components with Robot

The final advantage is in the implementation of the components. I employed

Polymer, which implements the Web Components standard, an open standard currently in use by Mozilla, Google, and others for developing modular user interfaces. Following the standard allows the Reality Editor’s components to interoperate with any other Web Components available through communities like Bower, NPM, or WebComponents.org. This exponentially amps up the system’s LEGO-like flexibility.

However, there are some drawbacks to the isolated component model. The first is that embedded user interfaces must communicate with the Reality Editor using the iframe postMessage API. This requires components to be very conscious of how they attempt to talk to the Reality Editor. Additionally, the Reality Editor must be able to understand and handle messages from any user’s components which requires careful coordination of adding new functions or deprecating old ones. This isolation does have the benefit of greatly

(26)

Communication with Native UI Element Communication with Component-based UI Ele-ment

reducing the attack surface available to malicious components. The second drawback is also related to the use of iframes. If a user touches an iframe in Safari, any further touches they perform will be locked into that iframe until they completely stop touching the screen. I worked around this by completely virtualizing the touch event system at the level of the Reality Editor. Instead of touching the iframe, the user is always touching an invisible element that calculates synthetic touch events based on the augmented reality

transformations applied to each interface’s iframe.

6 In-browser Editor

The first pain point for users attempting to create new interfaces for objects is the difficulty of editing HTML and JavaScript files. Usually this requires a two step process where the user must open a specialized code editor then locate the files relevant to an object. If the user has never edited code before, this becomes an even more difficult and lengthy process as they must first select and install a trustworthy code editor. My work in this area greatly simplifies this process. It allows a user of the Reality Editor to modify their objects’

(27)

In-browser Editor User Interface

interfaces directly in their browser without installing any third-party software.

In-browser Editor System Diagram

This in-browser editor is implemented using a Reality-Editor-specific file transfer API and Microsoft’s Monaco editor. The API allows for asynchronously updating object interface files with user-provided data. The in-browser editor automatically retrieves the

(28)

current file contents from Monaco to save the user’s modifications to the server. Monaco is a full-featured browser equivalent to Microsoft’s Visual Studio Code editor. Users do not lose any functionality of a separate native code editor when they elect to use the web interface. Additionally, because the editor knows that every file it opens is for the Reality Editor, it is possible to perform context-specific autocompletion and code snippet

recommendation. The edited files can also be verified with domain-specific static analysis to ensure that the user does not encounter unexpected behavior by their objects.

Implementing an in-browser editor frees users to focus on the design of their interface instead of the mundanities of installing text editors and locating files.

7 In-App Drag-and-Drop Editor

Despite the simplification afforded by the in-browser editor, writing code is still a hurdle for many less technically-minded users. To simplify the Reality Editor authorship process even further, I developed a streamlined drag-and-drop tool for designing object interfaces within the Reality Editor app itself.

The implementation of this project was greatly simplified by applying the library of standardized user interface components I created. Because each component is a stand-alone entity by design, the drag-and-drop functionality merely instantiates the component in an embedded iframe, providing security and isolation while only being around four lines of code. The difficulty of the implementation then was reduced to providing an API for attaching the dropped interfaces to objects.

The design of the interaction itself was relatively straightforward. A user selects a component to add from a small palette whereupon they can drag it into augmented reality space. The dragging mechanism requires standard 3D projection to take place so that the user interface component is positioned correctly relative to a tracked object. If there is no tracked object, the creation of components is prohibited as there would be no marker from

(29)

Editor with Component Palette Visible

which to determine the component’s position. Once the UI component exists in space, the user can drag and drop it again to move it around.

Editor Upload Process

(30)

interface creation. A designer using the system can look at an object and instantly drag and drop whichever components they desire to create a compelling experience. After they add the components, the same designer can wire up the nodes of the user interface to create a novel system of interactions.

8 Memory System

Memory System Displaying Saved State of Robot

The next major improvement I made to the Reality Editor system is the addition of a method for remote control. The Reality Editor excelled at in-person control where the user is able to look at all the objects they want to manipulate. However, once the user exited the usable range of the Vuforia marker tracking system they would be unable to effect their objects in any way. This range is limited not only by the line-of-sight of the camera, but also by the distance at which the camera can recognize enough visual detail

(31)

for Vuforia to function. Therefore, I designed a novel system for teleoperation of the Reality Editor system.

This system employs user-generated still images of objects to provide a visual representation of the remotely operated object. In the above image, a memory of the robot is displayed on the phone. Images of available objects are stored in a bar for easy access as seen below. Additionally, connections to objects with associated memories allow entering the connected object’s memory so that any series of connections is traceable from afar.

Memory Bar Storing Light and Robot Memories

The memory system underwent several redesigns and increases of scope before it settled in its current states. The initial implementation of the memory system was present in the Smarter Objects system as an app-based feature. In the Reality Editor, it was reimplemented using decentralized storage. Finally, the concept of memory pointers was introduced to allow the tracing of connections in memories.

During the initial implementation of the memory system I prioritized simplicity over universality. Smarter Objects had a centralized design centered around using only one device to interact with objects hosted by one server. This iteration kept all of the data of the memory system in the mobile device’s storage instead of on the object server. This critical design choice meant that the system did not support sharing memories between users. However, it also meant that there was no need to write potentially buggy C++ networking code.

With the advent of the Reality Editor system, there was a dramatic shift in design from a centralized C++ server to a series of decentralized Node.JS servers. There was also

(32)

Initial Memory System Diagram

a greater emphasis on cooperation and coexistence of multiple users. This gave me the opportunity to redesign the memory system with decentralization and multiple users at its heart. Each object now stores its own associated memory data, letting anyone who has been in range of an object retrieve its memory for later remote control. The initial

implementation of the upload process was fully synchronous, capturing an image from the camera, encoding it as a JPEG, sending this to the server over HTTP. However, this process would cause the app to be unable to respond to the user for a short time. This was less than ideal, so I moved as much of the work to a secondary thread as possible. The final implementation of the uploader quickly sends a thumbnail to the local editor for immediate feedback, then performs the encode and upload process in the background. The measured improvement was a decrease from a 57 millisecond process to one that was only 9

milliseconds.

The chief benefit of the decentralized design is that the system can expose memories of any local object, even if the user has never seen it. This allowed me to solve one of the longest standing issues with connecting objects which was present in both Smarter Objects and the Reality Editor. The problem was that connections to off-screen objects had no clear representation. The existing stopgap solution was to draw a line to the edge of the

(33)

Decentralized Memory System Diagram

screen. My previous work in Smarter Objects attempted to use the inertial data from the phone to point as directly as possible to the off-screen object, but this was still vague and difficult to understand, especially when there were multiple off-screen objects. Applying the memory system led to a understandable and clear solution with a minimum of abstraction. Instead of drawing connections to the approximated off-screen position of an object, the Reality Editor draws the latest memory of the object, showing clearly where values are coming and going. These “memory pointers” automatically layout using a force simulation so that users can see all of an object’s pointers at once. The pointers also function the same way as memories in the memory bar, allowing the user to look at their object by tapping on the image or drawing a connection into the image. I determined from user testing that there was an issue with this improvement where users could accidentally enter an unrelated memory if it were in the way of their desired connection between two points. Requiring the user to hover over a memory for a short delay while also changing the appearance of the memory pointer to denote that it was about to take over the screen eliminated this source

(34)

of user error. In the image below, the robot has connections to two lights, both of which are off-screen. To allow the user to quickly examine the connected lights, the Reality Editor displays a memory pointer to each light.

A View of the Robot with Memory Pointers Showing Connections to Two Lights

Another type of visual feedback that the memory system makes possible is the memory web. The memory web is intended for large scale deployments of the Reality Editor where the system is not easily inspected by walking up to each object or by cycling through memories. It provides a bird’s eye view of every object in a system by applying the lessons I learnt from the memory pointer visualization. It takes pointers representing off-screen objects to the next level and replaces every object with a pointer. This web of memory pointers and connections then is arranged using a simple force layout. Once again, the user can enter memories to inspect the individual objects in the current system. Users equipped with this layout can easily answer questions about which light switch is

controlling the most lights, whether any motors are disconnected, or any other large scale query that previously would have required individually inspecting each object to answer.

(35)

Memory Web

9 Camera User Experience

While the memory system works well as a visualization of the initial state of the object, it fails to reflect the evolution of the object over time. For example, controlling a light

remotely through a memory does not let the user see if the room around the light is properly lit by their modifications. Discovering a proper way to enable users to view real-time feedback required several iterations and the introduction of a new type of object to the Reality Editor.

(36)

The first iteration worked entirely within the bounds of the existing Reality Editor system. The concept was to create user interfaces that suggest the side effects of users’ actions. For example, a light interface would represent the current color and intensity of the light as a color swatch. However, this broke down with more complex interactions. A motor’s interface could draw a spinning animation, but there was no way of mapping the effects of the motor’s actuation into the memory image. This approach did have the benefit of requiring no changes to the existing system, but a more comprehensive solution was available.

Camera Used in Prototypes

This solution was to add a network-connected camera capable to the space. A user could then control this camera with the Reality Editor to view the effects of each of their adjustments. The camera only needed to be located in a place where it could see all of the other objects. Because the Reality Editor is based on standard web technology, it was relatively simple to embed the video stream from the camera, a task that would have taken months if implemented using the pure C++ of Smarter Objects.

(37)

Initial Camera User Interface

Camera User Interface with Presets

The main flaw of the camera system was that it required the user to painstakingly position the camera to look directly at the object they wanted to inspect. If the user were

(38)

attempting to manipulate objects at either end of a room it would devolve into rotating the camera back and forth. The solution to this problem was to employ a feature built into most network-connected cameras: preset positions. The camera would know about the exact positions of each of the objects in the space. This way the user would only need to press a button to look at a specific object. An additional benefit was that the preset position was accurate enough that the interface of the object could be drawn over its camera image. This simulated the effect of the Reality Editor’s tracking system despite the camera not supporting it directly. While the entire setup is vulnerable to object movement or reconfiguration, it shows the level of ease of interaction possible due to the extensibility of the Reality Editor.

10 User Experience Improvements

The final part of my contribution to the Reality Editor project is a series of overarching improvements. The first goal was to improve the onboarding process for new users. My second focus was improving the performance of the Reality Editor, especially on more resource-constrained mobile devices. I next implemented a universal approach for drag-and-drop interactions in the Reality Editor, greatly reducing code reuse. Finally, I applied my knowledge of static analysis to as much of the code as possible to improve its quality and consistency. This included building off of my previous research in static analysis to produce a novel tool for identifying problematic code patterns.

The improvements for novice users took three main forms: labeling, documentation, and removing surprises. Labeling is a simple change that has great effects on legibility. Throughout the Reality Editor, there were several places where simply adding a bit of explanation would greatly improve a process. For example, a blank text box next to a button saying “Create Object” has no identifiable purpose. When this text box is labeled “New Object Name”, a user no longer has to worry what the purpose of the text box could

(39)

be. Documentation is in some ways the next step of labeling. The code of the Reality Editor often lacked appropriate explanations of its functionality and effects. A new

contributor to the project could be daunted by having to read several hundred lines of code to understand when and where a particular method was being invoked. Widely adopting a standardized format of documentation leads to far-ranging improvements that are not just limited to easing the journey of a beginner. The Reality Editor employs JSDoc, a

documentation format which adds static type hints to otherwise dynamically typed

JavaScript. This allows static analysis to pick up a variety of common bugs. An additional benefit is that there are tools for automatically generating a human-readable guide to the code’s interfaces and methods. The removing of surprises is one of the more abstract parts of this contribution. Surprises in software are any time when the expected and actual behavior of the system differ. These quickly can frustrate users and lead to painful

debugging sessions. For example, the Reality Editor used to only inject the scripts required for controlling objects into files named “index.html”. If a user named their file anything else or tried to have an interface with multiple HTML files, their interface would fail to control any object. It was a simple fix with no overarching drawbacks to inject the scripts into all HTML files. Labeling, documenting, and fixing surprises came together to be one of the most important steps in making the Reality Editor more beginner-friendly.

For a user to enjoy their time with the Reality Editor, it has to be able to react to their inputs in a timely manner. The standard in user experience research is that if a system takes less than one tenth of a second to respond to an input it will be perceived as instantaneous (Miller). Additionally, any movement that updates below 45 frames per second can be perceived as choppy by most of the population (Humes). This becomes an even harder problem because the Reality Editor has to juggle responding to users, rendering smooth animations, and performing augmented reality marker tracking on performance-limited mobile hardware. Throughout my work on the Reality Editor, I took time to measure the effects my changes had on the performance of the system.

(40)

Benchmarking is a difficult problem with many potential pitfalls including premature optimization, where misguided attempts at making an experience more performant can result in underwhelming gains in speed while obfuscating code. An example of a significant performance win is the case of the slow pocket. Every time the pocket was open the app would slow to a crawl. After a significant amount of narrowing down the cause, it was found that the time to render a frame was dominated by graphics, not by code. This meant there was not an easy algorithmic bottleneck to fix. Instead, the problem lay in the

indiscriminate use of a gigantic blur filter. A blur filter takes time proportional to the screen area times the filter area. Removing this blur filter removed the performance problem and greatly improved the experience of using the pocket. There were also opportunities for preemptive optimization even though these required great care to not become premature. For example, the choice of rendering target for the Web Components. There were doubts as to whether to use SVG or Canvas to draw the dynamic user interfaces of the Reality Editor components. An incorrect choice would have performance ramifications on every experience created with the Reality Editor so it was of paramount importance to get right the first time. By creating a small benchmark I was able to conclusively show that there was no appreciable difference between SVG or Canvas. SVG would take 30.7 milliseconds to render a complex scene while Canvas would take 27.5. While a bit of an anticlimax, this allowed me to proceed with an SVG-based implementation in confidence. Consistently considering the performance impacts of changes to the Reality Editor means that the current app runs at a smooth sixty frames per second on standard mobile hardware.

One benefit came from making code more abstract and generic. I was able to implement a universal framework for drag-and-drop interaction by applying software engineering principles. Before I started this part of my contribution, there were three different approaches used in three different parts of the codebase. Nodes, logic nodes, and UI elements each had unique characteristics in how they approached being touched and moved. After the consolidation, every single implementation uses the same basic interface.

(41)

A unique problem to enabling this code reuse was the iframe security barrier present with the WebComponent-based UI elements. However, a simple translation script allows them to use the same privileged editor code as the two types of nodes. Eliminating competing implementations is not a glamorous new feature, but it reduced the number of potential bugs threefold.

Shelob Visualization of Early Code

Updated Shelob Visualization

While documentation and refactoring are both good ways to reduce the prevalence of bugs in code, they require a significant investment of programmer time. Static analysis tools analyze code without running it (hence “static”), identifying bugs before they can frustrate users. There were two major tools that I applied to the Reality Editor. The first, ESLint, is a publicly available utility for identifying potentially troublesome JavaScript. With the Reality Editor, ESLint usually discovered mistyped identifiers or missing commas. However, it missed opportunities for improving the code by eliminating overuse of globals. ESLint only gives the option to allow or disallow use of a global, not identify whether the use of globals within a project is excessive. I wrote the tool Shelob to calculate the use of globals in JavaScript codebases. Its companion tool, Torech, visualizes Shelob’s data to

(42)

allow users to make educated decisions about modularization and design. Shelob’s data is a graph whose nodes represent files and whose links represent references to global variables. In early versions of Reality Editor it showcased a pandemic of globals. An updated version of the same diagram shows a much sparser graph with more careful use of globals.

11 Conclusion

The Reality Editor is designed to make common tasks centered around controlling connected objects as simple as possible. Looking at two objects to draw a line between them is relatively straightforward and works well. However, there were four main areas where users of the Reality Editor would encounter significant difficulty. The first is the creation of new interfaces and addition of new objects into their space. This process requires extensive web design experience in addition to deep knowledge of the Reality Editor’s internals. In my contribution, I solved this problem through a multi-pronged approach. By adding Polymer-based Web Components, I set the groundwork for a series of improvements to the knowledge barrier. First, I was able to add an in-browser editor to reduce friction from users with computers. I then implemented a drag-and-drop in-app editor so that users with only an app could still create compelling experiences. The second goal is the control of objects at a distance. There was no way for users to easily manage connections between objects outside of visual range of each other or to interact with

objects remotely. The memory system fills this need while learning from the inefficiencies of Kasa, Google Assistant, and other apps. It uses objects’ visual identities to provide a simple, easily understandable tool for remote control. The third area to improve was the visualization of the effects users have on objects. As a consequence of enabling remote interaction, users could have manipulated their objects without any knowledge of the effects they would have. Instead, I integrated a standard networked camera with the Reality Editor. Through an extensive series of prototypes, I settled on a design which

(43)

allows users to see the effects of their actions without sacrificing simplicity or power. The final target was increasing the general approachability and usability of the Reality Editor. My first contribution was to improve the beginner user experience through clarifying labels, increasing documentation, and removing inconsistent behavior. I also measured several critical components of the Reality Editor system to ensure that the app would run at acceptable frame rates on all platforms. I was able to significantly improve code quality through static analysis. When conventional static analysis tools failed to generate the data I needed, I created my own tool, Shelob, to continue improvements. Finally, I greatly reduced code complexity and user confusion by standardizing a method for drag-and-drop interaction within the Reality Editor. With my contribution, an unskilled, novice user of the Reality Editor can easily create enjoyable augmented reality experiences that have a real impact on the physical world.

(44)

Works Cited

Blippar Group. (2017). Layar: Augmented Reality, Interactive Print. https://www.layar.com/.

Edgaras Art. (2015). Augmented Reality Tutorial No. 14: Augmented Reality using Unity3D and Vuforia. https://www.youtube.com/watch?v=qfxqfdtxyVA Google. (2017). Google Assistant: Your own personal Google.

https://assistant.google.com/

Humes, L., Busey, T., Craig, J., Kewley-Port, D. (2009). The effects of age on sensory thresholds and temporal gap detection in hearing, vision, and touch. Attention,

Perception, & Psychophysics. 71(4): 860–871.

MacIntyre, B., Gandy, M., Dow, S., and Bolter, J.D. (2004). DART: A Toolkit for Rapid Design Exploration of Augmented Reality Experiences. Proc. 17th Annual ACM

Symposium on User Interface Software and Technology. 197–206.

MacIntyre, Blair., et al. (2015). The Argon Project: AR with Web Technology. http://argon.gatech.edu/.

Maloney, J., Burd, L., Kafai, Y., Rusk, N., Silverman, B., and Resnick, M. (2004). Scratch: A Sneak Preview. Second International Conference on Creating, Connecting, and

Collaborating through Computing. Kyoto, Japan, pp. 104–109.

Miller, R. B. (1968). Response time in man-computer conversational transactions, Proc.

AFIPS Spring Joint Computer Conference. Vol 33, 267–277.

Radu, I., and MacIntyre, B. (2009). Augmented-reality scratch: a children’s authoring environment for augmented-reality experiences. Proc. 8th International Conference

on Interaction Design and Children. 210–213.

Topplab AB. (2016). Noodl. http://www.getnoodl.com/. TP-Link Technologies Co. Ltd. (2017). KASA.

http://www.tp-link.com/us/home-networking/smart-home/kasa.html. Unity Technologies. (2017). Unity: Game Engine. https://unity3d.com/.

(45)

Zappar Ltd. (2017). ZapWorks: Create Amazing Augmented Reality Experiences. https://zap.works/.