ALLIANCE: Improving Eciency - Motivation calculation

Motivation calculation

L- ALLIANCE: Improving Eciency

This chapter describes an extended version of ALLIANCE, called L-ALLIANCE (for Learning ALLIANCE), that preserves the fault tolerant features of ALLIANCE while incorporating on-line, distributed control strategies that greatly improvethe eciency of the cooperative robot team performing a mission composed of independent tasks.

These strategies allow each individual robot to learn about the quality with which robot team members perform certain tasks, and then to use this learned knowledge to determine the appropriate action to activate at each point in time. This chapter rst provides the motivation for why this learning is necessary, discusses the assump-tions made in L-ALLIANCE, and then provides a formal description of the learning problem. I show in section 4.3 that this learning problem is NP-hard, concluding that requiring the robot team to derive the optimal selection of actions through learning is unrealistic.

In section 4.4, I discuss the scalability of L-ALLIANCE as the size of the robot team and the size of the mission grows, showing that through parallelism, the AL-LIANCE and L-ALAL-LIANCE techniques are independent of the size of the mission, and grow linearly with the number of robots on the team. I then discuss the mecha-nism used in L-ALLIANCE to allow robots to learn about the performance levels of teammates, and describe various distributed control strategies I investigated for using this learned knowledge to improve the eciency of the team. In section 4.5, I present the empirical results of these investigations in simulation for a large space of possible cooperative robot teams that vary in the number of robots, the size of the mission to be performed, the degree of task coverage, the degree of heterogeneity across robots, and the degree to which the Progress When Active condition introduced in chapter 3 holds. I then compare the results of the best distributed control strategy to the op-timal solution and show that this nal control strategy performs quite close to the

optimal allocation of tasks to robots for those examples in which the optimal solution can be derived. In section 4.6, I provide the details of how the learning approach is incorporated into the L-ALLIANCE motivational behaviors. Finally, I conclude in section 4.7 by returning to my original design requirements for the cooperative ar-chitecture, describing the contributions L-ALLIANCE makes towards meeting those requirements.

4.1 Motivation for Eciency Improvements via Learning

As described in chapter 3, the ALLIANCE architecture allows robots to adapt to the ongoing activities and environmental feedback of their current mission. However, ALLIANCE does not address a number of eciency issues that are important for cooperative teams. These issues include the following: How do we ensure that robots attempt those tasks for which they are best suited? Can we enable the robot team to increase its performance over time? Does failure at one task imply total robot failure? How does a robot select a method of performing a task if it has more than one way to accomplish that task? How to we minimize robot idle time?

The L-ALLIANCE enhancementto ALLIANCE addresses these issues of eciency by incorporating a dynamic parameter update mechanism into the ALLIANCE ar-chitecture. This parameter update mechanism allows us to preserve the fault tolerant features of ALLIANCE while improving the eciency of the robot team performance.

A number of benets result from providing robots with the ability to automatically adjust their own parameter settings to improve eciency:

Relieve humans of the parameter adjusting task:

As described in chapter 3, ALLIANCE requires human programmer tuning of motivational behavior parameters to achieve desired levels of robot performance.

Although nding good parameter settings is often not dicult in practice, the cooperative architecture would be much simpler to use if the human were re-lieved of the responsibility of having to tune numerous parameters.

Improve the eciency of the mission performance:

Related to the previous item is the issue of the eciency of the robot team's performance of its mission. As human designers, it is often dicult to evaluate a given robot team performance to determine how best to adjust parameters to improve eciency. However, if the robots were controlled by an automated action selection strategy that has been shown to result in ecient group action selection in practice, then the human designer can have condence in the robot

4.1. MOTIVATIONFOR EFFICIENCYIMPROVEMENTSVIA LEARNING 75 team's ability to accomplish the mission autonomously, and thus not feel the need to adjust the parameters by hand.

Facilitate custom-designed robot teams:

Providing the ability for robot teams to carry over their learned experiences from trial to trial would allow human designers to successfully construct unique teams of interacting robots from a pool of heterogeneous robot types for any given mission without the need for a great deal of preparatory work. Although ALLIANCE allows newly constructed teams to work together acceptably the rst time they are grouped together, automated parameter adjusting mecha-nisms would allow the team to improve its performance over time by having each robot learn how the presence of other specic robots on the team should aect its own behavior. For example, two robots pursuing the hazardous waste cleanup mission that can both nd the location of the spill, but with dier-ent task completion times, should learn which robot performs the task quicker and allow that robot to nd the spill location on future missions, as long as it continues to demonstrate superior performance. Through this learning, the robots should thus allow the mere presence of other team members to aect their subsequent actions.

Allow robot teams to eciently adapt their performance over time:

During a mission, a robot team's environment and the ability of its members may change dynamically. However, in the basic ALLIANCE architecture, pa-rameter settings do not change after the start of the mission. Thus, these robot teams would be quite vulnerable to calibration problems due to drifts in the environment and in robot capabilities. The ability to automatically update parameters during a mission is therefore of critical importance.

Providing a robot team with the ability to automatically update its own motiva-tional behavior parameters requires solutions to two problems:

How to give robots the ability to obtain knowledge about the quality of team member performances

How to use team member performance knowledge to select a task to pursue Solutions to the rst problem require a robot to learn not only about the abilities of its teammates, but also about its own abilities. Although each robot \knows" the set of behaviors that it has been programmed to perform, it may perform poorly at certain tasks relative to other robots on the team. Robots must thus learn about these relative performance dierences as a rst step toward ecient mission execu-tion. However, learning these relative performance quality dierences is only a rst

step in improving eciency. The next, more major, question, is how robots use the performance knowledge to eciently select their own actions. This chapter describes the L-ALLIANCE approach to these problems.

4.2 Assumptions Made in L-ALLIANCE

Two key assumptions are made in the development of L-ALLIANCE, as follows:

A robot's average performance in performing a specic task over a few recent trials is a reasonable indicator of that robot's expected performance in the future.

If robotriis monitoring environmentalconditionsC to assess the performance of another robotrk, and the conditionsC change, then the changes are attributable to robot rk.

Without the rst assumption, it will be quite dicult for robots to learn anything at all about their own expected performance, or the performance of their teammates, since past behavior would provide no clues to the expected behavior in the future.

The trick, of course, is determining which aspects of a robot's performance are good predictors of future performance. In L-ALLIANCE, I have used the simple measure of the time of task completion, which has served to be a good indicator of future performance.

My second assumption deals with the well-known credit assignment problem, which is concerned with determining which process should receive credit (or pun-ishment) for the successful (or unsuccessful) outcome of an action. The assumption I make in L-ALLIANCE is that the only agents which aect the properties of the world that a robot ri is interested in are the robots that ri is monitoring. Thus, if a robot rk declares it is performing some task, and that task becomes complete, then the monitoring robot will assume that rk caused those eects. This assumption is certainly not always true, since external agents really can intrude on the robots' world. However, since this issue even causes problems for biological systems, which often have diculty in correctly assigning credit, I do not concern myself greatly with this oversimplication.

4.3 The Eciency Problem

I now formally dene the eciency problem with which I am concerned. As in chap-ter 3, letR =^fr¹;r²;:::;rn^grepresent the set ofn robots on the cooperative team, and

4.3. THE EFFICIENCYPROBLEM 77 the set T = ^ftask¹;task²;:::;taskm^g represent the m independent tasks required in the current mission. Each robot inR has a number of high-level task-achieving func-tions (or behavior sets) that it can perform, represented by the setAi =^fai¹;ai²;:::^g. Since dierent robots may have dierent ways of performing the same task, I need a way of referring to the task a robot is working on when it activates a behavior set.

Thus, as in chapter 3, I dene the set of n functions ^fh¹(a¹k);h²(a²k);:::;hn(ank)^g, where hi(aik) returns the task that robot ri is working on when it activates behavior setaik.

Now, I dene a metric evaluation function, q(aij), which returns the \quality"

of the action aij as measured by a given metric. Typically, we consider metrics such as the average time or average energy required to complete a task, although many other metrics could be used. Of course, robots unfamiliar with their own abilities or the abilities of their teammates do not have access to this q(aij) fuction.

Thus, an additional aspect to the robot's learning problem is actually obtaining the performance quality information required to make an \intelligent" action selection choice.

Finally, I dene the tasks a robot will elect to perform during a mission as the set Ui =^faij^jrobot ri will perform task hi(aij) during the current mission^g.

In the most general form of this problem, the following condition holds:

Condition 4 (Dierent Robots are Dierent):

Dierent robots may have dierent collections of capabilities; thus, I do not assume that ⁸i:⁸j:(Ai = Aj). Further, if dierent robots can perform the same task, they may perform that task with dierent qualities; thus, I do not assume that if hi(aix) = hj(ajy), then q(aix) =q(ajy).

Then I can dene the formal eciency problem under condition 4 as follows:

Dans le document in partial fulllment of the requirements for the degree of (Page 91-95)