30 3. Perspective Taking Applied to Motion Planning
Safety on the 2D Grid
The first criterion, called “safety criterion”, mainly focuses on ensuring the safety of the human by controlling the approximation distance of the robot (the farther the better). However in some cases, as in a close interaction (e.g. handling an object), the robot has to approach the person with whom it wants to interact. Therefore, the distance between the robot and the human is neither uniform nor fixed and depends on the interaction task. The feeling of safety is highly dependent on the human’s personality, his physical capabilities and his actual situation; for example, safety differs highly in a sitting position compared to standing. When the human is sitting, his mobility is reduced and he tends to have a low tolerance to the robot getting close. On the contrary, when standing up he has a higher mobility, thus allowing the robot to come closer. These properties are treated in the current system by a “safety grid”. This grid contains a human centered Gaussian form of cost distribution. Each coordinate (x, y) in this grid contains a cost inversely proportional to the distance to the human. When the distance between the human and a point in the environment (in the grid) D(x i , y j ) is greater than the distance of another pointD(x k , y l ), we have Cost(x k , y l ) > Cost(x i , y j ). Since the safety concerns lose their importance when the robot is far away from the human, the cost also decreases when getting farther from the human, until some maximal distance at which it becomes negligible.
II. GEOMETRIC TOOLS
Attention sharing requires psychological notions of per- spective taking and mental rotation taken into account in robots reasoning. As mentioned in previous section, perspec- tive taking is the general notion of taking another person’s point of view to acquire an accurate representation of that person’s knowledge. In the context of this paper, we are interested in visual perspective taking where the robot should place itself to human’s place to determine what he is actually seeing.
A management of mutual belief forhuman-robotinteraction
Aur´elie Clodic†, Maxime Ransan†, Rachid Alami†, Vincent Montreuil†‡
Abstract— Human-robot collaborative task achievement re- quires the robot to reason not only about its current beliefs but also about the ones of its human partner. In this paper, we introduces a framework to manage shared knowledge for a robotic system. In a first part, we define which beliefs should be taken into account ; we then explain a manner to achieve them using communication schemes. Several examples are presented to illustrate the purpose of beliefs management including a real experiment demonstrating a “give object” task between the Jido robotic platform and a human.
One of the long-term strengths of research in artificial intelligence has been the devel- opment of reasoning systems that can exploit expert knowledge in well-defined task do- mains. A non-trivial problem in this domain is getting information coded in the knowledge representation. For example, as in human development, the acquisition of knowledge at one level requires the consolidation of knowledge from a lower level. How is accumulated experience structured so as to allow the individual to apply this structured knowledge to new situations? The current research investigates how a robotic system that interacts with humans can acquire knowledge that can be formalized automatically, form- ing the expert knowledge that can be used forreasoning. Through physical interaction with a human, the iCub robot acquires experience about spatial locations. Once consoli- dated, this knowledge can be used in further acquisition of experience concerning the preconditions and consequences of actions. Finally, this knowledge can be translated into rules that can be used forreasoning and planning in novel problem solving situations. We demonstrate how multiple levels of knowledge acquisition are organized, based on expe- rience in interaction with humans, in two distinct problem solving domains. In the more complex domain, we demonstrate how the robot can learn the rules of the Tower of Ha- noi and solve novel instances of the problem, without ever having seen a complete solu- tion. This research illustrates how real world knowledge can be acquired by robots for use in AI planning and reasoning systems. This can provide the first step for more flexi- ble systems that can avoid the brittleness that has sometimes been associated with tradi- tional AI solutions where knowledge has been pre-specified.
The MIT thesis of West [ 12 ] describes the development of a new class of ball wheel for a smooth rolling, omnidirectional wheelchair. West undertook a comprehensive review of existing designs and four mechanisms considered in his thesis are repro- duced in Fig. 4 . In mechanism (a), the ball rests upon three spheri- cal bearings, the minimum number of contact points necessary to contain it. Spherical bearings, however, while providing passive support for unrestricted motion, do not lend themselves to position controlled actuation, a conclusion also reached by ROMAN’s developers. Configuration (b) replaces the bearings with three rollers. These could be powered to actuate the sphere; however, any component of an arbitrary rotation, not coplanar with the axis of a given roller, would cause slip along that roller and a loss of control. Configuration (c) overcomes this problem by providing an extra degree of freedom for one of the three rollers, thus allow- ing the ball to be controlled about the fixed rollers’ 2 DOF. In West’s final design (d), a ring of rollers, which itself is free to rotate, contains the ball while a single roller affixed to the chassis
Psychological studies on human interactions made sur- face the notion Perspective Taking. This notion, referring to reason on other persons’ points of view, eases the commu- nication between interacting individuals. For a robot inter- acting with people, we believe that Perspective Taking can play an important role that will allow the robot to under- stand, reason and act according to human’s point of view thus allowing an efﬁcient communication and cooperation. In this paper, we propose a group of algorithms to evalu- ate and generate robot conﬁgurations that are not only col- lision free but also obeying HRI constraints by reasoning explicitly on human’s perspective.
C. Digital Human Models for Ergonomics
A collaborative robot can be used to assist the human worker and improve ergonomics at work . Ergonomics scores typically rely on kinematics and dynamics information about the human’s movement, which are often extracted from simulations of Digital Human Models (DHMs). There are two main types of DHMs: the first are musculo-skeletal models, which are rather complex, have many degrees of freedom, and allow the analysis of the human movement by simulating the muscular efforts ; the second are rigid body models, which are simplified models with less degrees of freedom, where the human is basically represented as a humanoid robot made of rigid body links . Such a DHM can be used to reproduce a variety of motions demonstrated by humans operators and captured via motion tracking devices . While the first ones are rather complex and expensive in terms of computational resources (it can take several minutes to simulate a small movement), the second ones are simpler but faster to simulate. As such, they are better suited for real-time applications such as model- based prediction, control and ergonomics assessment . Several ergonomics scores exist (e.g., RULA, REBA), and they are primarily based on postural information .
because the spatial situation may be more easily described from one perspective rather than another . Ambiguities arise when one speaker refers to an object within a reference system (or changes the reference system, i.e. switches perspective) without informing her partner about it [66, 67]. For example, the speaker could ask for the “keys on the left”. Since no reference system has been given, the listener would not know where exactly to look. However, asking for “the keys on your left” gives enough information to the listener to understand where the speaker is referring to. On the contrary, when using an exact, unambiguous term of reference to describe a location (e.g.. “go north”) no ambiguity arises. In Spark, agent-dependent spatial relations are computed from the frame of reference of each agent. Symbolic Locations. Humans commonly refer to the positions of objects with symbolic descriptors (like on, next to...) instead of precise, absolute positions (qualitative spatial reasoning). These type of descriptors have been extensively studied in the context of language grounding [68, 69, 70, 71, 72]. Spark distinguishes between agent-independent symbolic locations (allocentric spatial relations) and agent-dependent, relative locations (egocentric spatial relations). Spark computes three main agent-independent relations based on the bounding box and centre of mass of the objects (Figure 5a, ): isOn holds when an object O 1 is on another object O 2 , and is computed by evaluating the centre of mass of O 1 according to the bounding box of O 2 . isIn evaluates if an object O 1 is inside another object O 2 based on their bounding boxes BB O 1 and BB O 2 . isNextTo indicates whether an object O 1 is next to another object
Our experimental results underline the challenges of the setting we propose. We need to understand which object the user refers to from the ones available in the scene. Therefore we propose a way to guide different strategies for target object segmentation thanks to a simple initial interaction recognition. Standard object recognition classifiers trained on state-of-the-art object recognition databases exhibit a low performance on our dataset. However, the recognition rate improves when these methods are trained on annotated data from our dataset. This confirms that our data is in a significantly different segment of the domain due to the particularly natural setting of HRI. We evaluate our pipeline and set an initial baseline, presenting promising results about the use of multimodal data to enable learning from noisy object patches.
5.3.1 Multimedia Learning Scenario
A multimedia presentation (of a 8 minutes duration and a total of 10 slides) using a web application of how to be healthy by having a healthy diet was presented to each participant by using different embodiment (a tablet or a robot) (as it can be seen in Figure 5.1 ). At the beginning of the experiment the participant was informed about the final goal of the project, which is us- ing the system for teaching the participants (potentially the elderly) on how to be healthy by having a good nutrition. Afterwards, the presentation on the robot or tablet started with a brief explanation of the topic and the in- structions. The participant had the choice to repeat the speech of each slide, go back to the previous slide, or go to the next slide. These options are pro- vided in order to facilitate the learning to the pace of each participant. At the end of the presentation, the participant completed a test with 7 questions related to the information provided during the presentation, for which they had opened time to complete it, and he/she gets his/her score and an en- couragement for the next time session. The participants also completed the Big 5 personality test (Goldberg, 1990 ) in order to determine their personality traits.
Similarly to the Extend version of the basic RRT algorithm , a configuration is randomly sampled. It yields both the nearest tree node to be extended, and the extension direction. This stage also integrates collision detections in the presence of binary obstacles. Thus, if the new portion of the path leads to a collision, a null configuration is returned and the extension fails independently of the associated costs. This extension process ensures the bias toward unexplored free regions of the space. In the second stage irrelevant config- urations regarding the search of low cost paths are filtered using a transition test similar to the one used in stochastic optimization methodsbefore inserting a new configuration in the tree.
I. I NTRODUCTION
In recent years, there has been an increasing discussion about the merits of totally autonomous robots versus the importance of user control and decision making . On the one hand, the advancement in artificial intelligence decision- making techniques for aerial robots, also known as drones, has significantly increased the number of applications for a team of autonomous agents (for instance, search and rescue missions –, autonomous infrastructure inspection , or autonomous patrolling systems –). On the other hand, the continuous evolution of Human-RobotInteraction (HRI) allows human operators to improve more and more their performances in controlling and deciding. Mixed-initiative interaction provides the interface between these two worlds considering that the agents’ (human and robot) abilities are complementary and are likely to provide better performance when joined efficiently than when used separately , , .
2. Mental States of Interest forHuman-RobotInteraction
2.1. Situation Awareness, Resource Engagement and Associated Mental States
Humans’ mental states are numerous, and it seems impossible—and possibly even irrelevant—to try and estimate every one of them. However, several ones play a major part in error occurrence and are therefore particularly relevant to characterize and estimate in order to improve human-system interaction in a general manner, including human-robotinteraction, in the case of remote operation. In the Human Factors domain, a mental state that has gathered much attention since its creation in the aeronautical context is Situation Awareness (SA). Endsley defined SA as “the perception of the elements in the environment within a volume of time and space, the comprehension of their meaning and the projection
In this paper we will address this issue of predicting the places ‘where’ human can perform a task. The task might have been predicted by the robot or already known to the robot. For example, if human wants to give some object to the robot, the robot will predict where human can perform that task as well as where robot can take the object, then it will move its hand proactively to support the task as well to guide the human to achieve the joint task of hand over. Such behavior will also make the human ‘aware’ about the ‘awareness’ of the robotfor the current task. As shown in fig. 1 (a), the robot requested, “Please give me the toy dog.” and remained in the rest position. This might create confusion forhumanabout how and where to give it to the robot: “Should I move and reach near to the robot or should I stand and put the object in the hand of the robot or should I put it at a place somewhere on the table for the robot to take it?” But if the robot along with the request to give, also moves its hand towards probable place to take, as shown in fig 1(b), it will greatly guide the humanabout how and
Humans can follow a plan this way, though a person may of course chose to act in violation of that plan by mistake or intentionally for other reasons. Humans may act as plan-driven agents if they wish to.
A common problem with plans for the real world is to decide their validity. Commonly in theoretic computer science, plans are validated during construc- tion, such that their validity at creation time is proven. And in otherwise static environments, the validity of a plan remains intact unless unexpected events happen, which is assumed to happen rarely. So in classical planning domains, the validity of a plan does not need to be re-evaluated often in the common flow of events. However in domains with unpredictable dynamics, such as in HRI, plenty of unexpected events happen all the time, many of which have no impact on the validity of a plan, but some which may have an impact. As a particular problem, in HRI plans are often used to produce robot behavior that not just pro- duces a correct outcome, but does so using acceptable manners. The inability to identify that a plan has become invalid causes robots to appear less intelligent. The most simple example is taking away an object a robot is about to grasp. Many prototype robots in research will not adapt to the sudden absence of the object, and continue the grasp motion plan, grasping just thin air. So in general checking the validity of a plan is a difficult chore in robot control that design- ers will often rather circumvent by making the robot environment as static as possible.
2.6 Conclusion 19
frames while LSTM is utilized to learn the evolution of information in time. Our spatial attention mechanism localizes and crops hand images of the person which are subsequently passed as inputs to our CNN networks unlike previous techniques [90, 89] where entire image frames are exploited as input. Contrary to , where a pre-trained state-of-the-art network is fine-tuned on entire image frames of gestures dataset, we fine-tune Inception V3 on background substituted hand gestures dataset to be used as our CNN block as already mentioned above. We extract image-embeddings from the last fully connected (FC) layer of our fine-tuned Inception V3 of size 1024 elements which is exploited as an input mod- ality in our dynamic gestures detector. Moreover, previous strategies for dynamic gestures recognition/video analysis [17, 94, 83, 84] employed 3D human skeleton to learn large-scale body motion thus required specialized sensor modalities, we on the contrary, only utilize 2D upper-body skeleton as an additional modality to RGB hand images in our algorithm. Nevertheless, scale information about the subjects is lost in monocular images. Thus, we also propose novel learning-based depth estimators which determine the approximate depth of the person from the camera and region-of-interest around his/her hands from upper-body 2D skeleton coordinates only. To reiterate, the inputs to our dynamic gestures detector are limited only to color hand images and an augmented pose vector obtained from 8 upper-body 2D skeleton coordinates, unlike other existing approaches like , which includes full-frame images in addition to hand images, depth frames and even optical flow frames altogether in their algorithms. Thus our proposed strategy is generic and straightforward to implement in mobile systems.
The adaptive system also performed better in terms of percentage of concurrent motion. The percentage of concurrent motion was calculated by analyzing the video of the task execution and dividing the number of seconds of concurrent motion by the time of task execution during which concurrent motion was possible. The portion during which concurrent motion was possible was defined to be from when the robot motion starts to when the human finishes placing the last screw. This is because it is possible for the human to get ahead of the robot and finish the task early while the robot still has a few screws to apply sealant to. During this portion of task execution, concurrent motion is not possible, since the human is done with his tasks, and so this segment is ignored in concurrent motion calculations. Based on these definitions, the percentage of concurrent motion for the non-adaptive condition was 68% which is 24% lower than the 92% obtained in the adaptive condition. This result indicates that the adaptive system’s capability of avoiding locations where the human is expected to move to next allows the human worker to move toward successive target locations more freely and with less hesitation. This result was confirmed by reviewing the video recordings of the experiment and observing how the subjects in the two groups performed the task. One can see that the subjects in the non-adaptive condition initially attempt to work concurrently, but quickly switch to a strategy in which they wait and attempt to time their motions to evade the robot. This timing strategy is not present during task execution in the adaptive condition, with the subjects reaching toward successive target locations without waiting for the robot.
2.3.7 Trajectory Control
Reactive controller for object manipulation is a research topic that is part of the fundamen- tals of robotic manipulation. Firstly, trajectory generation based approaches where devel- oped. In [ Buttazzo 94 ], results from visual system pass firstly through a low-pass filter. The object movement is modeled as a trajectory with constant acceleration. On this basis, catch- ing position and time is estimated. Then a quintic trajectory is calculated to catch the object, before being sent to a PID controller. The maximum values of acceleration and velocity are not checked when the trajectory is planned, so the robot gives up when the object moves too fast and the maximum velocity or acceleration exceeds the capacity of the servo con- troller. In [ Gosselin 93 ], inverse kinematic functions are studied, catching a moving object is implemented as one application, a quintic trajectory is used for the robot manipulator to joint the closest point on the predicted object movement trajectory. The systems used in those works are all quite simple and no human is present in the workspace. A more recent work can be found in [ Kr¨oger 12 ], in which a controller for visual servoing based on Online Trajectory Generation (OTG) is presented. The results are promising.
Figure 16 shows another example with the vi- bration observer-controller. The intention of the
operator was to move the device at approximately 0.5m/s (which is a moderate human arm speed) while being stiff. The interaction is very stable and there is no vibration. The same interaction without vibration controller leads to significant vibrations.
5. Greetings with Nao
A first analysis of the behavior of the participants towards the Nao robot was conducted. Nao is a mini-humanoid robot, developed by SoftBank Robotics (former Aldebaran Robotics). The purpose of the interaction was to present the robot to the children and adults with ASD for a short duration (up to 2 minutes). Indeed, some of the individuals with ASD are reluctant to unusual events and changes in their daily routine. The robot was smoothly introduced so as to avoid fear towards the robot. In addition, in , the authors observed that children who saw the robot act in a social-communicative way were more likely to follow its gaze than those who did not. Hence, we believe that introducing them smoothly the robot as a social partner by showing them the Nao robot in the context of a short greeting task may help the participants to interact with the robot in further experiments. We also wanted to verify that the behavior of our participants was linked to their