• Aucun résultat trouvé

Development of ANN with Adaptive

Connections by CE

Julián Dorado, University of A Coruña, Spain Nieves Pedreira, University of A Coruña, Spain Mónica Miguélez, University of A Coruña, Spain

Abstract

This chapter presents the use of Artificial Neural Networks (ANN) and Evolutionary Computation (EC) techniques to solve real-world problems including those with a temporal component. The development of the ANN maintains some problems from the beginning of the ANN field that can be palliated applying EC to the development of ANN. In this chapter, we propose a multilevel system, based on each level in EC, to adjust the architecture and to train ANNs. Finally, the proposed system offers the possibility of adding new characteristics to the processing elements (PE) of the ANN without modifying the development process. This characteristic makes possible a faster convergence between natural and artificial neural networks.

Introduction

Nature has shown to be a highly efficient method for problem-solving, so much so that countless forms of life — in the end, countless solutions for survival problem — make their way in the most complex and diverse conditions.

Big steps in several science fields have been achieved by means of the imitation of certain mechanisms of nature. An example of this might be the Artificial Neural Networks (ANNs) (Freeman & Skapura, 1993), which are based on brain activity and are used for pattern recognition and classification tasks. Another example of this symbiosis between science and nature is Evolutionary Computation (EC) (Holland, 1975; Bäck, 1996), whose techniques are based on evolution by natural selection and regarding a population of potential solutions for a given problem where both, mutation, and crossover operators are applied. EC techniques are mainly used at fitness and optimisation tasks.

ANNs are currently the most suitable of the artificial intelligence techniques for recognition patterns. Although this technique is not entirely free of problems, during several decades it has been offering solvent systems that have been successfully transferred to the industrial environment.

ANNs internal structure consists of a series of processing elements (PE) which are interconnected among them, same as biological neurons do. The ability of ANNs for problem-solving lies on, not only the type and number of PE, but also the shape of the interconnection. There are several studies about the development of PE architectures but also about the optimisation of ANNs learning algorithms, which are the ones that adjust the values of the connections. These works tackle the two main current limitations of ANNs, due to the fact that there is not mathematical basis for the calculation of the optimal architecture of an ANN, and, on the other hand, the existing algorithms for ANN learning have, in some cases, convergence problems so that the training times are quite high.

Some of the most important ANNs are those as recurrent ANNs (RANNs) (Haykin, 1999) that tackle temporal problems, quite common in the real world, and different from the classical type of problems of non-temporal classification performed by static ANNs.

However, the difficulties for their implementation induced the use of tricks as delays (TDNN) or feed-forward networks from recurrent networks (BPTT) for solving dynamic problems. On reaching maturity, the recurrent ANNs that use RTRL work better with dynamic phenomena than the classical ones, although they still have some problems of design and training convergence.

With regards to the first of these problems, the architectural design of the network — in both ANN models, feed-forward and Recurrent — the existence of a vast amount of design possibilities allows experimentation but also sets out the doubt about which might be the best of combinations among design and training parameters. Unfortunately, there is not a mathematical basis that might back the selection of a specific architecture, and only few works (Lapedes & Farber, 1988; Cybenko, 1989) have shown lower and upper limits for PE number at some models and with restricted types of problems. Apart from these works, there are only empirical studies (Yee, 1992) about this subject. Due to this situation, it cannot be said for sure that the architecture selected is the most suitable one without performing exhaustive architectural tests. Besides, nowadays there is a clear

dichotomy between recurrent and nonrecurrent ANNs regarding, not only the problems for what they are used but also the way in which network training is performed in each case.

In order to palliate this problem and to make the design process automatic, several fitness methods for architecture have been developed for certain ANN models. Some of these methods are based on connection pruning (Gomm, Weerashinghe, & Willians, 1998;

Setiono, 1997) and some others increment or decrement the PE number — overall at recurrent type (Martinetz, Berkovich, & Schulten, 1993; Fritzke, 1995). All these methods fall in local minimums because networks create new PE as they receive different inputs and they are quite dependent from the network initial stage and also from the order in which training patterns receive.

Regarding the training of the ANNs, the classical algorithms, which are based on the gradient descent, are quite sensitive to local minimums of the search space. Moreover, in order to achieve the network convergence, the designer has to configure another set of parameters that are involved in training, such as learning rate and individual param-eters that belong to every algorithm. Another intrinsic problem of all learning algorithms is that they are not easily adaptable to the working modifications of both, PE and connections; therefore, this would prevent the development and the implementation of new characteristics and improvements at ANNs models. One of the possible improve-ments might be the incorporation of biological characteristics, similar to those of natural neural cells, for a better understanding and functionality of the artificial neurons.

A possible solution to these problems is the use of new optimisation techniques. Genetic Algorithms (GA) (Fogel, Fogel, & Porto, 1990; Yao, 1992) are easy-functioning optimisation techniques of EC that achieve good results. They have been applied, as well as other EC techniques, to the architectural adjustment for years (Robbins, Hughes, Plumbley, Fallside, & Prager, 1993), so that they represent an open field for investigation.

The works regarding ANNs training by means of GA are quite previous (Whitley, Starkweather, & Bogart, 1990). This training method has some advantages over tradi-tional gradient-based ones because it is less sensitive to local minimums owing to its better sampling of the search space of all the weights of the network connections. The two training approaches might be mixed in a Lamarkian GA, in such a way that genetic operators are first applied to the population for obtaining the offspring of every generation, and after this, several training cycles with gradient descent algorithm are performed prior to the evaluation of every new individual (or set of weights). This approach makes good use of GA exploration and avoids the problem of local maximums by means of the use of gradient descent methods.

Finally, the training by means of GA allows modifying PE performance and connections at one’s discretion with the only alteration of the evaluation function. In this case, the performance of the network elements does not affect that of the learning algorithm. This independence allows the incorporation of new characteristics, which was quite difficult when using the gradient algorithms.

Proposal

This paper proposes a system design based on a multilevel design with progressive fitness that might be able to work as natural neural networks do.

The proposed scheme is based on the use of several levels of EC techniques (Holland, 1975; Bäck, 1996; Bethke, 1981; Bramlette, 1991). Specifically GA are going to be used at two levels in order to achieve, in a rapid and efficient way, the combination of parameters for both, architecture at the upper level and training at the lower level.

Another aspect that is considered is the incorporation of new characteristics to the ANNs. In order to validate that the model envisages this possibility, a modification regarding PE performance is proposed. The fact of considering PE only as a node that provides an output, which depends on the input values, is a vast simplification of the behaviour of biological neurons. In this work, with the aim of increasing ANNs poten-tiality, and by means of the study of biological evidences, it is proposed to add a temporal Figure 1. General diagram of system performance

component to the activations as a modification for PE performance. These modifications try to mimic the excitation at the natural neuron, which is induced by the action potential and subsequent refractory periods. Besides, this modification is associated with a higher complexity for ANN development and training, which is due to the addition of new parameters that influence the performance of the system.

A prediction of classical time series at statistical field was carried out in order to validate the performance of the proposed modifications. These results were compared with those obtained when prediction had been done after series characterisation with ARIMA models — autoregressive integrated moving average (Box, 1976; Wei, 1990).

ANNs Multilevel Development with