Collective Control - John Fulcher, University of Wollongong, Australia

We now turn to a consideration of the way in which a collection of robots can be coordinated in order to accomplish a task. The emphasis in swarm intelligence is on decentralised control, or autonomy. In robotics, a collection of autonomous robots is an example of decentralised control, since there is no centralised controller responsible for their coordination. Insect societies are similarly only locally, or indirectly controlled;

there is no central body or agent that issues commands to organise the nest or colony.

One of the advantages of decentralised control in swarm robotics is an increase in fault tolerance (again, there is no risk that a centralised controller will fail and result in a deterioration or breakdown of the system). Individual autonomous robots can also respond more quickly and flexibly to a changing environment, since they can respond directly to information from their own sensors, and do not need to wait for centralised instructions.

It is possible for cooperative behaviour to emerge as the result of the combined effect of individual behaviours. Few would disagree that the cooperation found in insect societies is the result of emergent properties, rather than planning. Similarly, instances of apparently cooperative behaviour can be found in collections of autonomous robots;

examples are described in the following section on applications. A classic example of decentralised control of a group of robots is that of the Frisbee-sorting robots of Holland and Melhuish (1999). Here the cooperation that occurs is emergent, since the individual robots are simple and autonomous, and incapable of direct communication with each other: they each follow a fixed set of reactive rules.

Questions about how best to achieve such emergent behaviours in robots are currently an active focus of research. One method is to handcraft the rules or control mechanisms. The adaptive, cooperative behaviour of the Frisbee-sorting robots is the combined result of individual robots following a fixed set of reactive rules, and shows that it is quite possible to generate adaptive behaviour without the use of learning algorithms. There are many cases of adaptive behaviours that are genetically determined:

for example, “hard-wired” reflexes and instincts. Control mechanisms for robots could then, for example, consist of relatively simple “rule-like” mechanisms, encoded in terms of handcrafted weights for a neural network, such that when an obstacle is detected via a robot’s sensors, the result is that it turns away from it. Similarly, a subsumption-based system of behavioural modules could result in adaptive behaviour, despite having been handcrafted rather than learnt or evolved. The Case Study section provides an example.

A popular and effective alternative to handcrafting such control mechanisms is to use genetic algorithms to evolve them. Nolfi and Floreano (2000), in their book on evolutionary robotics, describe some of the numerous examples of the elegant solutions that can be obtained when evolutionary techniques allow the environment to determine their design. Neural networks are particularly useful control mechanisms for autonomous robots because of their flexible response to noisy environments, but it is not obvious how to provide detailed training feedback about what a robot should do at each time step. An evolutionary algorithm, and its associated fitness function, provides a mechanism for an overall evaluation of the performance of the network over the entire evaluation period.

It can also be used to evolve any parameter of the neural network, in other words, not just the weights, but also the learning rule, neuron transfer function, and network topology. Some interesting work has also been initiated in which evolutionary tech-niques have been applied to the modular design of a neural network controller (Nolfi, 1997), although this approach has not yet been fully developed or exploited.

Evolutionary robotics has been shown to be useful in the development of effective control mechanisms for individual robots. The feedback it provides also offers a mechanism for developing effective control strategies for a multi-robot team, although research in this area is very much in its infancy. To date, questions about the evolution of multi-robot, or multi-agent systems have primarily been investigated in simulation (for instance, Baldassarre, Nolfi, & Parisi, 2002; Martinoli, 1999), but we anticipate that this is an area that will receive an increasing amount of attention in the near future.

Currently, questions about how to use evolutionary techniques to evolve robotic control structures form a particularly active focus for research. Of course, it is never the case that a control system is simply evolved in response to the environment: The researcher has always made some contribution. Choices and decisions will have had to be made about certain aspects such as the environment, the task, the initial architecture, and evolutionary operators such as the fitness function, for example. Nonetheless there is still much to be said for keeping such intervention as much to a minimum as possible, based on the underlying idea of getting as close as possible to emulating natural evolution. An interesting study in terms of an attempt to reduce the level of experimenter intervention is a technique termed “embodied evolution” (Watson, Ficici, & Pollack, 2002). Watson, Ficici, and Pollack’s reported aim is to create an evolutionary technique that can be run automatically on a group of robots without the need for global communication, or a reliance on simulation. Robots are left in the task environment (the task in question being one of phototaxis, or attempting to reach a light source from different starting positions). The evolutionary mechanism used was crossover, with a simple neural network control architecture serving as the evolutionary substrate. Using a mechanism termed probabilistic gene transfer algorithm (PGTA), each robot maintains a virtual energy level that reflects its performance, and probabilistically broadcasts genetic information locally, at a rate proportional to its energy level. Robots that are close enough to each other will pick up this information, and allow the broadcast genes to overwrite some of their own. Robots accept broadcast genes with a probability inversely related to their own energy levels. The result is that those robots with a higher energy level (because they have performed the task more effectively) are more able to broadcast information, and less likely to allow their own genes to be overwritten. The method, when tested, compared favourably to a hand-designed solution to the same task.

There is also interest in the combination of evolutionary and learning techniques, where a neural network control structure is further refined by a lifetime learning process (Nolfi, 2003). Such an approach can be used to develop individuals with a predisposition to learn: evolving effective starting conditions, or initial weight matrices; or evolving the tendency to behave in such a way that the individual is exposed to appropriate learning experiences. The consequence of combining evolution and learning has been shown in some studies (for example, Nolfi & Parisi, 1997) to lead to some promising results. A further area that offers promise is to explore the application of evolutionary techniques to the design decisions about the initial set of behavioural modules and to the methods used to combine them.

An interesting alternative to evolving neural network weights for control systems is to develop a training set by remotely controlling a robot around an environment, and collecting examples of inputs and corresponding appropriate motor responses (Sharkey, 1998). Other than this, the main alternative to handcrafting neural networks, or evolving their weights, is to use reinforcement learning algorithms. Such algorithms have the advantage of not requiring a training set; what is needed instead is a scalar evaluation of behaviour to guide learning. The evaluation could be provided by a trainer, or by the agent itself as a result of its interaction with the environment. What is needed is a policy, or mapping from states to actions, that maximises positive reinforcement. Various algorithms have been explored from Q-learning (Watkins, 1989) to the learning classifier system advocated by Dorigo and Colombetti (1998) that incorporates a reinforcement algorithm closely related to Q-learning to adjust the strength of its rules, and a genetic algorithm to search the space of possible rules.

Reinforcement learning has also been applied to the issue of collective behaviour.

Mataric (1997) presents a method through which four mobile robots learn, through reinforcement, social rules about yielding and sharing information in a foraging task. The problem is one of finding a way of reinforcing individual robots for behaviour that benefits the group as a whole, since greedy individualist strategies will result in poor performance in group situations with resource competition. The solution she investi-gates is one that relies on social reinforcement for appropriate use of four behaviours:

yielding, proceeding, communicating, and listening. Her results indicate improved foraging behaviour in a group of four mobile robots subject to social reinforcement, as compared to a group using only greedy individual strategies. In the social reinforcement condition, robots were rewarded (a) for making progress to sub-goals of finding food, or returning home (b) for repeating another robot’s behaviour and (c) for observing reinforcement delivered to another robot; in other words, vicarious reinforcement, whether positive or negative, is shared amongst all robots in the locality.

In conclusion: The application of swarm intelligent principles to the control of robot collection is best exemplified by studies in which no use is made of global control.

Decentralised control is needed to achieve the full advantages of scalability and redundancy, and there is a considerable body of research that has investigated different methods of achieving desired collective behaviours without resorting to global control.

There are also some other examples, such as that of ALLIANCE architecture (Parker, 1998), that share several features with swarm robotics (for example, the use of robots that individually are subject to decentralised control), but in which some form of global control, or global communication (see next sub-section) is employed.

Communication

Communication is of course closely linked to the control of robot collectives. Again our concern here is with swarm intelligence, and hence with minimal and local commu-nication. This is an area that is beginning to receive more attention. For some in the area such as Luc Steels (1996), the concern is to develop explanations of the evolution of complex communicative abilities. For others, the concern is to make use of limited communicative abilities to extend the capabilities of the group. Communication of some form is likely to be required to accomplish some form of task allocation, and the coordination needed to jointly undertake and complete a task. Clearly biological systems such as those formed by social insects depend on some forms of communication. These include alarm, recruitment, and recognition (Wilson, 1971), but also indirect communi-cation by means of the environment. Cao, Fukunaga, and Kahng (1997) identify three major forms of inter-robot communication: interaction via (a) the environment (b) sensing and (c) communications. We shall make use of their distinctions, since they are useful in terms of robots, but note that the distinction between sensing and communication is not a particularly useful one in terms of social insects, since the communication they are capable of depends on their ability to sense chemical pheromones.

Dans le document John Fulcher, University of Wollongong, Australia (Page 176-179)