• Aucun résultat trouvé

Action Learning and Grounding in Simulated Human-Robot Interactions

N/A
N/A
Protected

Academic year: 2022

Partager "Action Learning and Grounding in Simulated Human-Robot Interactions"

Copied!
2
0
0

Texte intégral

(1)

Action Learning and Grounding in Simulated Human-Robot Interactions

(Extended Abstract)

?

Oliver Roesler and Ann Now´e

Artificial Intelligence Lab, Vrije Universiteit Brussel, Brussels, Belgium oliver@roesler.co.uk, ann.nowe@vub.ac.be

1 Introduction

Service robots that are employed in human-centered environments with a high degree of complexity, unpredictability, and dynamicity must be able to learn new tasks au- tonomously, i.e. when only the goal of the task is given. Furthermore, robots must be able to understand natural language instructions to accurately identify the requested tasks, which requires connections between symbols, i.e. words, and their meanings, i.e.

percepts. There exist many studies in the literature that investigate action learning or grounding, but few consider both simultaneously. Additionally, action learning studies have been limited to learn a single action while only varying the initial position of the gripper [3,4]. Furthermore, grounding studies were mostly conducted offline and pri- marily focused on grounding of object characteristics or spatial concepts [2,1], while conducted action grounding employed simple feature vectors, which cannot be directly translated into motor commands [5].

In this paper, we investigate the possibility of simultaneous action learning and ground- ing through the combination of reinforcement and cross-situational learning. More specif- ically, we simulate human-robot interactions during which a human tutor provides in- structions and illustrations of the goal states of the corresponding actions. The robot then learns to reach the desired goals taking into account different manipulation be- haviors for different object shapes and grounds the words and detected phrases of the instructions, including synonyms, through obtained percepts.

2 System Overview

The employed grounding and action learning system consists of three parts: (1) Human- robot interaction simulation, which generates different situations consisting of the initial gripper and object positions, relative goal positions of the manipulation objects, object colors, object shapes, and natural language instructions, (2) Reinforcement learning al- gorithm, which employs Q-learning to learn optimal micro-action patterns for encoun- tered situations taking into account initial gripper and object positions as well as the relative goal position of the manipulation object, (3) Cross-situational learning compo- nent, which identifies auxiliary words and phrases, and maps percepts to non-auxiliary words and phrases in an unsupervised manner by analysing co-occurrences.

?This is an extended abstract of Roesler and Now´e [6].

Copyright©2019 for this paper by its authors. Use permitted under Creative Commons Li- cense Attribution 4.0 International (CC BY 4.0).

(2)

2 O. Roesler and A. Now´e

3 Results

After about 60,000 situations the reinforcement learner required 1 episode to converge to the optimal policy, when using a continuously decreasing exploration rate that is shared across situations. In contrast, when the exploration rate was reset for each sit- uation, the reinforcement learner required after about 9,000 situations on average 28 episodes. That the agent did not execute the optimal policy immediately in the latter case, is due to the high exploration rate at the beginning of each situation because it was reset. Thus, a continuously decreasing exploration rate that is shared across situa- tions works best for the investigated scenario. The employed CSL algorithm is able to successfully ground all 39 words used in this study through their corresponding percepts after about 800 situations. Afterwards the number of correct mappings is constantly 39, while the number of false mappings oscillates between 0 and 2 because the algorithm allows a word to be grounded through several percepts to be able to learn homonyms.

The two additional incorrect mappings are for different word combinations, depending on the most recently encountered situations.

4 Conclusion

The proposed framework allowed learning of actions through reinforcement learning as well as identification of auxiliary words and phrases, and grounding of words and phrases, including synonyms, through cross-situational learning during simulated human- robot interactions. In future work, the framework will be extended to handle real shape, color and preposition percepts obtained with a stereo camera.

References

1. Aly, A., Taniguchi, A., Taniguchi, T.: A generative framework for multimodal learning of spatial concepts and object categories: An unsupervised part-of-speech tagging and 3D visual perception based approach. In: IEEE International Conference on Development and Learning and the International Conference on Epigenetic Robotics (ICDL-EpiRob). Lisbon, Portugal (September 2017)

2. Fontanari, J.F., Tikhanoff, V., Cangelosi, A., Ilin, R., Perlovsky, L.I.: Cross-situational learning of object-word mapping using neural modeling fields. Neural Networks22(5-6), 579–585 (July-August 2009)

3. Gudimella, A., Story, R., Shaker, M., Kong, R., Brown, M., Shnayder, V., Campos, M.: Deep reinforcement learning for dexterous manipulation with concept networks. CoRR (2017), http://arxiv.org/abs/1709.06977

4. Popov, I., Heess, N., Lillicrap, T., Hafner, R., Barth-Maron, G., Vecerik, M., Lampe, T., Tassa, Y., Erez, T., Riedmiller, M.: Data-efficient deep reinforcement learning for dexterous manip- ulation. CoRR (2017),http://arxiv.org/abs/1704.03073

5. Roesler, O., Aly, A., Taniguchi, T., Hayashi, Y.: Evaluation of word representations in ground- ing natural language instructions through computational human-robot interaction. In: Proceed- ings of the 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

Daegu, South Korea (March 2019)

6. Roesler, O., Now´e, A.: Action learning and grounding in simulated human robot interactions.

The Knowledge Engineering Review. (In Press)

Références

Documents relatifs

In this article we present how a robotic simulation software, in which the human is integrated, can help to train a dialogue system for realistic HRI from scratch. In Section 2

in "Neural Network-Based Memory for a Social Robot: Learning a Memory Model of Human Behavior from Data ", build upon the state-of-the-art in machine learning from

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des

The robot’s current knowledge of the world is based on the description of objects in the environment and assumptions about the visual access of its human partner.. A first attempt

2) Knowledge representation and management: the robot is endowed with an active knowledge base that provides a symbolic model of its beliefs of the world, as well as models for

To effectively learn to imitate posture (see Figure 1) and recognize the identity of individuals, we employed a neural network architecture based on a sensory-motor association

The objective of this experiment was to show that the convergence of a classical reinforcement rule was possible when combined with a neural network rhythm predictor learning

For example, to plan the first move- ment of a pick and give task, the planner must firstly choose an initial grasp that takes into account the double-grasp needed to give the