• Aucun résultat trouvé

Discovering the Generalization Capacity

Another aspect to be taken into account when designing the discovery mechanism is that it may be expanded in order to determine an important feature presented by ANNs, which has been one of the main reasons for their success: the generalisation capacity. The purpose is to extract the knowledge which the ANN obtains from the training patterns and from test cases that have not been provided during the learning process.

For this purpose, we must design an algorithm to create new test cases which are going to be applied to the ANN, allowing its behaviour to be analysed when faced with new situations, and also to discover its functional limitations. This aspect is of particular relevance when applying ANNs to certain fields with a high practical risk, such as monitoring nuclear power plants and diagnosing certain medical conditions, and it has not been clearly used in the scientific literature about ANNs and rule discovery until now.

The ideal situation to be able to extract knowledge about the generalisation ability of the ANN would be to analyse the results produced by the ANN when faced with all of the possible combinations of input values. However, this becomes unviable as the search area increases, meaning that it will be necessary to provide a mechanism that makes selections depending from which input-output combinations (the group of examples) the extraction of knowledge will be made. One alternative would involve selecting a subgroup of these combinations randomly and in a uniform distribution. This leads to obtaining a relatively low number of patterns, meaning that the rule extracting process does not require excessive computing power. However, it does have the inconvenience that it is not possible to guarantee that all of the combinations have been selected, making it possible to extract all of the generalisation capacity of the ANN.

As regards discovering the generalisation capacity, an algorithm known as the “algo-rithm for the creation of new patterns” has been designed. New test cases may be created using this algorithm, making it possible to discover rules and expressions of those parts of the ANN’s internal functioning which have not been explored by the learning patterns.

We are not sure if the behaviour of the ANN matches the behaviour of the external reality;

we only hope so. The use of new test cases has the sole purpose to better mimic the model of the ANN; it cannot necessarily be extended to the point that the rule set will better mimic the problem being considered. For this purpose, we start with the learning patterns, applying to them the rule-discovery algorithm that has been previously used through GP.

The solution used is based on the premise that all ANNs obtain the capacity for generalisation from the implicit knowledge of the training files. This means that the example-creating algorithm will have to make use of these files. Additionally, it will be provided with a new set of input combinations, selected at random and based on the inputs of the previous files.

Having determined the existence of this group of new inputs, its size will have to be established. If it is small, its influence will barely be noticeable in the final result. On the contrary, if it is too high, it will make the solutions tend toward the information provided by these new inputs, minimizing the contribution of the original training patterns. Once a good adjustment is obtained, the algorithm for creating new patterns is applied, thus generating a series of new input patterns which are applied to the ANN, obtaining the relevant outputs. These new patterns “new_inputs - outputs_ANN” are added to the training patterns, renewing the rule-discovery process. It should be noted that in order to continue with the discovery process, the first step is to reassess each and every individual in the population in order to calculate its new adjustment. Once this has been done, the discovery process may continue.

As the rule extraction process advances, it is possible to eliminate the examples that the rules obtained so far have captured from the group of new inputs. When there is a new test case in which the ANN’s output coincides with that produced by the best

combi-nation of rules obtained, then that test case is eliminated, as it is considered that the rules obtained have acquired the knowledge represented by that test case. It is possible to add new examples at the same time, making it possible to explore new regions of the search space. The examples provided by the training files will stay constant throughout the process, meaning they will always be presented to the rule extraction algorithm.

Therefore, the process is carried out in an iterative manner: A number of generations or simulations of the rule-discovery algorithm are simulated, new test cases are generated once it is finished, and those whose result coincides with that of the ANN are eliminated.

Once all the relevant test cases have been eliminated, new cases are created in order to fill in the gaps left by those that have been deleted. Each new case is compared with the existing ones in order to determine if it is already present in the test cases. In this case, it is eliminated and a new one created in the same way. Then the process starts again, running the discovery algorithm N generations, once again repeating the elimination and creation of test cases. This process will be complete when the user determines that the rules obtained are those desired.

The parameters used by the algorithm for the creation of new patterns are described in detail in Rabuñal (2004), together with a more formal description of the algorithm.

In order to generate a new input pattern from those already existing, a number of different alternatives may be applied: creating all of the new values randomly, or, starting out with some of the original patterns, modifying them in some way in order to generate a new entry. Using the first method, it is possible to generate patterns that vary greatly from each other, and which do not correspond to the areas of the search space in which the ANN work, which may reduce the validity of the rules obtained. However, this may be interesting in the event of wishing to check how the ANN responds in extreme situations.

In the case of the second alternative, one of the training patterns is chosen at random, and by slightly modifying its values, a new example case is constructed. In this way, inputs are generated that are not too far apart in the search space of the training cases.

Therefore, the tendency will be to produce new input sequences from which all of the possible output cases are generated for which the ANN has been trained.

Although the algorithm for generating examples is totally independent from the archi-tecture of the ANN, the fact of working with a feedforward or recurrent network means that the algorithm is totally different.

When working with recurrent networks, it is not enough to compare the output offered by the RANN with that offered by the rules for an input pattern to determine if the rules have captured the correct knowledge or not. This is due to the fact that the functioning of an RANN is not based on individual input patterns, but instead on sequences of inputs. This means that the behaviour of the RANN can only be said to have been captured when the rules offer the same outputs as the RANN itself for all of the values of a sequence of entries.

The solution adopted in the particular case of the RANN uses the training files as its initial base, in order to generate a new starting sequence which, when applied as input to the ANN, will produce the subsequent input-output combinations. For example, if an RANN is used for predicting a short-term time series, it will work as a feedback system: The value is given at moment t and the network predicts the value at the moment t+1, and this value at moment t+1 is provided as input to predict the value at moment t+2, and so on.

A sequence of inputs will be eliminated when the rules simulate the behaviour of all of the values of the series with an error below a pre-established threshold — when the sequence of output values for the ANN obtained from the new value created coincides with the sequence obtained applying the rules extracted about this value. When this occurs, a new input value must be created, from which a new sequence of inputs will be generated so that the rule extraction process can start again.

In the case of working with RANN, the system will have a series of additional parameters.

In total, it will have the following parameters:

Percentage of new patterns. How many new patterns will be used apart from the training patterns. In the case of not working with time series, these new patterns are checked so that they are different from the training patterns. If working with a time series, these new random patterns are added at the end of the series used for training.

Probability of change. The process for creating new test cases focuses on generating new sequences from the training data. A sequence of inputs is chosen at random from the training file. For each input, the same value is maintained or a new one is generated at random, following a certain probability. This technique is similar to the EC mutation process.

Minimum error for survival. Once the process has started to extract rules, and as these are generated, they reflect the patterns with which they are being trained.

Once the rules extracted sufficiently reflect the behaviour of the ANN with a given pattern, this pattern is no longer necessary, and so it is discarded and a new one created. This parameter indicates when a pattern should be discarded, when the error value obtained for the rules in this specific pattern is below a certain threshold.

Number of feedback outputs. When working with recurrent networks, the aim is also to study the behaviour of the network by taking the outputs it returns and using them as inputs. This parameter indicates the number of patterns (the group of patterns whose size is indicated by the parameter “percentage of new patterns”) that will not be generated at random, but which instead will be the result of simulating the network, taking each input as the output in the previous moment.

Figure 3 is a graphic representation of the idea. The way of eliminating the sequence of examples also will vary. In this case, only a random value is created, and this is what creates all of the following cases. The criteria for elimination will be when the R new sequences created have a comparison error with those produced by the RANN of zero, or when the rules assimilate the knowledge that exists in all of the subseries of new cases.

As a result, all of them will be eliminated. The next step will be to create a new random value, use it to construct the new subseries, and start the extraction process again until the adjustment is achieved, and so on.

As may be seen in Figure 3, an overlapping occurs in the sequence of inputs created between the inputs (X) and the outputs, as a result of the first randomly generated value.

In this way, it is possible to analyse the behaviour of the RANN faced with a multitude of possible future sequences, and to abstract the generalisation knowledge resident in

Figure 3. Initial process for crating examples in cases of RANN and Temporary ANN

Here, it is important to note that in order to obtain an output, the RANN not only takes into account the input value at the actual moment, but that the outputs from previous moments also have an influence. This means it makes use of the ability for internal memory provided by the feedback, and therefore that every effort should be made to ensure that this information stored in the ANN does not affect the outputs associated with the new patterns created by feedback. To do so, before generating a new example sequence, a series of randomly generated values are entered into the ANN so that the internal memory is not determined by previous examples.