• Aucun résultat trouvé

Speeding up learning with dynamic environment shaping in evolutionary robotics

N/A
N/A
Protected

Academic year: 2021

Partager "Speeding up learning with dynamic environment shaping in evolutionary robotics"

Copied!
3
0
0

Texte intégral

(1)

HAL Id: inria-00175508

https://hal.inria.fr/inria-00175508

Submitted on 28 Sep 2007

HAL

is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire

HAL, est

destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Speeding up learning with dynamic environment shaping in evolutionary robotics

Nicolas Bredeche, Louis Hugues

To cite this version:

Nicolas Bredeche, Louis Hugues. Speeding up learning with dynamic environment shaping in evolu- tionary robotics. 5th International Workshop on Epigenetic Robotics, Jul 2005, Nara, Japan. p.128.

�inria-00175508�

(2)

Speeding up Learning with Dynamic Environment Shaping in Evolutionary Robotics

Nicolas Bredeche

Inference and Learning group - TAO/INRIA LRI - University Paris-Sud

bredeche@lri.fr

Louis Hugues

Laboratoire d’Informatique de Paris 6 louis.hugues@wanadoo.fr

Abstract

Evolutionary Robotics is a promising ap- proach to automatically build efficient con- trollers using stochastic optimization tech- niques. However, works in this area are often confronted to complex environments where even simple tasks cannot be achieved. In the scope of this paper, we propose an ap- proach based on explicit problem decompo- sition and dynamic environment shaping to ease the learning task.

1. Step-by-step learning task

Evolutionary Robotics offers an efficient and easy- to-use framework for automatically building con- trollers for an autonomous robots. However, a ma- jor drawback of this approach relies in the diffi- culty to define the fitness function (i.e. the learning setup) in order to get satisfying results. As a con- sequence, many works have been limited to simple problems such as wall-avoidance or target-following (Nolfi and Floreano, 2000).

In the scope of this paper, we are interested in the decomposition of a straightforward learning task within a complex environment into learning sub- tasks that can be easier to achieve from the robot viewpoint. A typical task we address here is that of a robot that learns to explore a complex environment so as to find food items as well as learning other be- haviors. The proposed approach is based on incre- mental learning and dynamic environment shaping so as to simplify both learning tasks and environmen- tal conditions in order to speed up learning. In order to do so, we first introduce the concept of parame- ter optimization for an Environment Generator (i.e.

a function that instantiates an environment from a class of target environments). Then, we suggest to define learning stages to build increasingly complex controllers thanks to task decomposition and envi- ronment shaping in order to fit the problem to con- trollers at hand.

Our approach relies on the use of a dynamic fit- ness that acts both on the reward given to the robot

as well as on the very environment used to train the robot. The idea is that the environment has to pro- vide paths to the design of an efficient robot con- troller. The primary objectives are to design an en- vironment generator for the task at hand as well as to identify optimization parameters for this genera- tor that may help learning from the agents point of view, even in the worst case (i.e. speeding up ran- dom search by ideally limiting the environment to the very specific sub-problems to solve). In order to do so, we define the following :

• The Environment Generator is a function that instantiates environments from a class of en- vironments (i.e. maze environments, human-like environments, extreme environments, etc.). Pa- rameters that control the environment generator are tuned during evolution so as to limit envi- ronment/task complexity so as to be the most suited to the current generation with regards to the learning task;

• a set ofsub-fitnessesthat take part in the global dynamic fitness and weighted sub-fitnesses com- bination patterns that evolves through time ac- cording to statistical information on the popu- lation performance. Accordingly we define the following :

dynamic fitness : balanced combination of fitnesses : basic tasks should be the completed first (e.g. wall-avoidance). This dynamic fit- ness is tuned during the course of optimization by switching from one learning stages to another as defined thereafter;

learning stages: online balance of fitnesses weight that changes according to a given order.

The goal is to speed up learning thanks to a col- lection of fitnesses for which the supervisor de- fines automatic activation patterns depending on the agents performance at hand (e.g. avoid-wall is the first and only fitness as long as relevant avoidance behavior has not been found - this favor small environments and running time parameters for example).

(3)

Figure 1: The task : a robot has to find and eat the yellow item in a maze environment.

As a consequence, agent population shall alternate between competition to optimize behaviors at a given stage of development (i.e. a specific set of parameters for the agent and the environment) and competition to explore new behaviors afterwards for which re- wards is less important but indeed favor exploratory agents that succeeds over the rest of the population (i.e. selection pressure forces to go further even for little gain while preserving what has been learned).

It is possible to address bothconcurrent andsequen- tial tasks by tailoring the environment and tuning the fitness so as to temporaly define a sub-task that focus on the very sub-problem at hand.

2. Preliminary Experiments

To illustrate our approach, we have set up the task illustrated in fig. 1, that is a robot looking for food in a maze-like environment. The goal is to get the food; the food can only be seen when the robot is on it; the robot has to perform a four steps sequence to eat the food; food location is randomly chosen every run; the targeteat sequenceis randomly chosen before starting evolution.

The robot controller is divided in two sub- controllers. Firstly, a classic three layers feed- forward neural networks with four input nodes en- coding the four infra-red quadrants of the robot, four hidden nodes and two output motor nodes, one for translational velocity and the other for rotational ve- locity (like a Khepera 2 mobile robot). Secondly, the eat sequence is encoded into the robot genome and is automatically performed whenever the robot is near some food. The robot plays the whole se- quence to the end before it can (or cannot) eat the food at hand. Each part of a sequence is a specific hard-coded behavior chosen among a library of 16 possible behaviors (e.g. 360 degree turn to the right, 90 degree to the left, etc.). As a consequence, the food is always removed (either lost or eaten).

Figure 2: Exp. 1 : eaten food items and food density.

There can be up to 21 food items in the environment – the optimal setup occurs when the environment contains only one food item that is eaten by the robot for any combination of robot starting and food positions. In this case, agent self-tuning the paramaters of the environment and task difficulty leads to compromise and local minima (i.e. learning stages must be fixed externaly).

Preliminary experiments showed the differences between a straight-forward fitness, a dynamic fitness where each individual may tune the environment and a dynamic fitness which is tuned along with the envi- ronment by an automatic supervisor (i.e. fitness and environment parameter tuning is performed accord- ing to predefined performance statistics based con- ditions). Optimized parameters include both neural networks weights, eat sequence parameters and en- vironment parameters (food density, epoch duration and eat sequence complexity). These preliminary ex- periments have shown that :

• fitness decomposition leads to faster learning thanks to simpler problem upon which optimiza- tion process may concentrate (in this case, this is due to the fact we rely on population-based stochastic optimization techniques, i.e. simpler problem avoid sparse population);

• leaving up to each individual the choice of tuning the environment parameters leads to compromise and local minima in the course of optimization (see fig. 2). As a consequence, it is better to outsource the tuning of the task difficulty to an external process, either through a predefined dy- namic fitness or co-evolved tasks vs. agents setup.

References

Nolfi, S. and Floreano, D. (2000). Evolutionary Robotics, The Biology, Ingelligence, and Tech- nologie of Self-Organizing Machines. MIT Press.

Références

Documents relatifs

The goal of such a model is to be able to provide a basis for detecting anomalies (i.e. perturbations) once the robot is immersed in the real world. That is, we intend to build an

Learning policies generated from previous learning episodes can be compared with data from the current learning episode to identify which actions were effective or ineffective

In my master thesis, I proposed the design and the development of a machine learning method based on DE for neural network optimization: DENN (Differen- tial Evolution for

Whenever the relationship between two products is identified as com- plementary and an explanation is required, it is possible to go back to the graph components and the values of

Most components employ one of three instructional strategies that support the mechanisms of learning with Invention: encouraging stu- dents to contrast cases,

As a consequence, since the wrapper model makes it possible to build a representation to learn from, it is a perfect candidate for implementing Perceptual Learning inspired operators

By summing up the bounds, using lin- earity of expected value, for all test-set leaves in the tree, we obtain an upper bound on the expected error when learning a consistent

As a result of the research, the following conclusions were made: reliance on the hermeneutical approach to the educa- tional process will create conditions for