Feature Selection by Genetic Algorithms - Support Vector Machines and Evolutionary Algorithms

Part III: Support Vector Machines and Evolutionary Algorithms

6.6 Feature Selection by Genetic Algorithms

SVMs have been demonstrated to circumvent the curse of dimensionality for highly multidimensional data sets [Joachims, 1998]. However, as we have previously seen in Chap. 5 as well, in practical complex tasks it may prove important to concentrate learning only on the most influential features from all given indicators. Taking the same example of a medical problem, the task may exhibit many medical factors [Stoean et al, 2011b], and even if ESVMs are able to give competitive results, they could also surely benefit from some form of prior feature selection. Moreover, as coefficients for each attribute are encoded into an individual, having many variables makes the genome too lengthy, which henceforth triggers more time for conver-gence. And, since EAs power the currently proposed technique, their flexibility can be further used in order to implement an ESVM embedded approach for choosing features, leading to improvement in its performance.

The methodology can be thus further endowed with a mechanism for dynamic feature selection provided by a GA (outlined in Algorithm 6.3). The GA-ESVM [Stoean et al, 2011b] is constituted of a population of individuals on two levels. The higher level of an individual selects optimal indicators, where the presence/absence of each feature is encoded by binary digits, while the lower one generates weights for the chosen attributes. Each level of an individual in the population is modified by its own genetic operators. The lower one is perturbed as before, while the higher GA parts of a chromosome undergo one-point recombination and strong mutation (see Chap. 3). Survivor selection is also performed differently at the high-level, since the offspring in the binary encoded component replace their parents counterpart only if better. The fitness assignment triggers evaluation at both levels, referring thus weights only for the selected features.

Algorithm 6.3 Selection of features and evolution of their weights by GA-ESVM.

Require: The optimization problem in (2.21)

Ensure: The fittest separating hyperplane with high training accuracy and generalization ability

begin t←0;

Initialize population P(t) of potential hyperplanes of coefficients w and b, where each indi-vidual is made of two levels: a binary one where the powerful attributes are selected and a real-valued level encoding where weights for these chosen features are generated;

Evaluate P(t) in terms of training accuracy and maximal separation between decision classes, taking into account both the selected indicators and the weights in each individual while termination condition is not satisfied do

Parent selection;

Recombination on the high-level population;

Recombination on the low-level population;

Mutation on the high-level population;

Mutation on the low-level population;

Evaluation;

Survivor selection for the next generation;

t←t + 1;

end while

return fittest hyperplane coefficients along with its selected features end

The enhanced mechanism was applied to the same problem of a correct discrimi-nation between degrees of liver fibrosis in chronic hepatitis C [Stoean et al, 2011b].

The results did not only exhibit a statistically better learning accuracy, but also an indication of both the decisive factors within the assessment and their weights in the process [Stoean et al, 2011b]. Recall that the original data set contains 722 sam-ples of 24 medical indicators and 5 possible degrees of fibrosis as output. For the differentiation of each class, the GA-ESVM dynamically selected those indicators that distinguish it from the other degrees. In this respect, the SVM one-against-all

88 6 Evolutionary Algorithms Optimizing Support Vector Learning scheme is preferred, as it permits a more accurate discrimination between the partic-ularities of each individual class and that of the rest. Starting from randomly chosen attributes, the algorithm cycle pursues an active evolution and an increase in the im-portance of certain features in the absence of others. Eventually, the coefficients for the remaining indicators represent weights for each given attribute. If the weight is closer to 0, then the corresponding feature has a smaller importance. If it conversely has a higher value, be that it is positive or negative, it will more powerfully influence the final outcome (as in Fig. 6.3) [Stoean et al, 2011b].

w1 w2 w3 w4 w5 w6 w7 w8 w9 w10 w11 w12 w13 w14 w15 w16 w17 w18 w19 w20 w21 w22 w23 w24 b Importance −2−1012

F0 F1 F2 F3 F4

Fig. 6.3 The fittest coefficients determined by the GA-ESVM for every medical indicator (1-24) and for each of the 5 classes. Each group of 5 bars is related to one attribute. Longer bars (both positive and negative) correspond to more important attributes. Attributes omitted by the GA are represented with a weight of 0, i.e., w₂for F0 and F4 [Stoean et al, 2011b]

6.7 Concluding Remarks

Several positive conclusions can be drawn from the feedback of the considered ap-plications, whose results thus confirm to have met the goals of the ESVM architec-ture, planned as a viable alternative to SVMs for certain real-world situations:

• The ESVMs inherit the learning roots from SVMs and therefore remain efficient and robust.

• In contrast to the standard SVM solving, the optimization component now pro-vided by means of EAs is transparent, adaptable and much easier to understand or use.

• ESVMs do not impose any kind of constraints or requirements regarding kernel choice. If necessary, for a certain more complex problem, it may even be simulta-neously evolved, e.g. by a na¨ıve approach like a HC that encodes a combination of different simple functions or by genetic programming [Howley and Madden, 2004].

• The ESVM is able to provide a formula of the relationship between indicators and matching outcomes. This is highly significant in order to obtain a rapid approx-imation of the response if certain values are appointed for the involved parame-ters, which may take place separately (from the training module) and instantly as a new example is given as input. This saves (computational) time when a critical case appears (e.g. in medicine).

• In addition, this formula is helpful for understanding interactions between vari-ables and classes that may be difficult to grasp even for experienced practitioners.

The weights also offer some insight on the importance of every attribute on the process.

• Moreover, the GA-ESVM is able to also dynamically determine the indicators that influence the result by an intrinsic feature selection mechanism.

• Most notably, the performance superiority of the alternative ESVM is statisti-cally significant in comparison to the traditional SVM in some of the applications [Stoean et al, 2009a], [Stoean et al, 2009b], [Stoean et al, 2011b].

There are however also some possible limitations that may arise in the experi-mental stage:

• EAs make use of many parameters which are nevertheless relatively easy to be set in practical applications.

• Because of the reference to all samples at the evaluation of every individual, ES-VMs have a slower training than SES-VMs, whose corresponding procedure makes only one call to the data set. However, in practice (often, but not always), it is the test reaction that is more important. Nevertheless, by observing the relationship between each result and the corresponding size of the training data, it is clear that SVM performs better than ESVM for larger problems; this is probably due to the fact that, in these cases, much more evolutionary effort would be neces-sary. Chunking partly resolves this issue, but other possibilities could be further investigated.

Chapter 7 Evolutionary Algorithms Explaining Support Vector Learning

All truths are easy to understand once they are discovered;

the point is to discover them.

Galileo Galilei

7.1 Goals of This Chapter

Even if SVMs are one of the most reliable classifiers for real-world tasks when it comes to accurate prediction, their weak point still lies in the opacity behind their re-sulting discrimination [Huysmans et al, 2006]. As we have mentioned before, there are many available implementations that offer the possibility to also extract the coef-ficients of the decision hyperplane (SVM light, LIBSVM). In Chap. 6 we have also presented an easy and flexible alternative means to achieve that. Nevertheless, such output merely provides a weighted formula for the importance of each and every attribute. We have shown that feature selection can extract only those parameters that are actually determinant of the class and solve the issue of redundancy. How-ever, the lack of any particular guidelines of the logic behind the decision making process still remains. This is obviously theoretically desired for a rigorous concep-tual behavior, however it is also crucial for domains like medicine, where a good prediction accuracy alone is no longer sufficient for a true decision support for the medical act. While accuracy certainly remains a prerequisite [Belciug and El-Darzi, 2010], [Belciug and Gorunescu, 2013], [Gorunescu and Belciug, 2014] supplemen-tary information on how a verdict had been reached, based on the given medical indicators, is necessary if the computational model is to be fully trusted as a second opinion.

On the other hand, classifiers that are able to derive prototypes of learning are transparent but cannot outperform kernel-based methodologies like the SVMs. The idea to combine two such opposites then sprung in the machine learning community:

kernel techniques could bring the prediction force by simulating learning, while transparent classifiers could interpret their results in a comprehensible fashion.

There are many attempts in this sense, and several namely concerning the two subjects of this book: SVMs and EAs. In this context, the last chapter puts for-ward another novel approach built with the same target. We begin by addressing the existing literature entries (Sect. 7.2), present the new combination (Sect. 7.3) and enhancements (Sect. 7.5, 7.6 and 7.7), all from the experimental perspective (Sect. 7.4).

C. Stoean and R. Stoean, Support Vector Machines and Evolutionary Algorithms 91 for Classification, Intelligent Systems Reference Library 69,

DOI: 10.1007/978-3-319-06941-8_7, cSpringer International Publishing Switzerland 2014

7.2 Support Vector Learning and Information Extraction

Dans le document Support Vector Machines and Evolutionary Algorithms for Classifi cation (Page 96-101)