2 Material and Methods - Lecture Notes in Electrical Engineering Volume 68

The details on support vector machine, the multiobjective evolutionary optimiza-tion, and the SVM trainings are presented in the current section.

2.1 Support Vector Machines

Support Vector Machines (SVM) has emerged as an alternative data-driven tool in many traditionally Artificial Neural Network (ANN) dominated fields. SVM was developed by Vapnik in the early 1990s mainly for applications in classifi-cation. Vapnik later extended his work by developing SVM for regression. The SVM can “learn” the underlying knowledge from a finite training data and deve-lops generalization capability which makes it a robust estimation method.

72 K. Gill et al.

The mathematical basis for SVM is derived from statistical learning theory. SVM consists of a quadratic programming problem that can be efficiently solved and for which a global extremum is guaranteed [2]. It is sparse (robust) algorithm when compared against ANN, but what makes it sparse is the use of Structural Risk Minimization (SRM) instead of Empirical Risk Minimization (ERM) as is the case in ANN. In SRM, instead of minimizing the total error, one is only minimizing an upper bound on the error. The mathematical details on SVM can be found in [3–5].

The functionf(x) that relates inputs to output has the following form:

fðxÞ ¼X^N

i¼1

wifðxiÞ þb (1)

Where N (usually N L) is the number of support vectors which are selected from the entire dataset of ‘L’ samples. The support vectors are subsets of the training data points that support the “decision surface” or “hyper-plane” that best fits the data.

The functionf(x) is approximated using the SRM induction principle by mini-mizing the following objective function: loss function. The samples outside thee-tube are penalized by a penalty termx. The above formulation makes the solution sparse in the sense that the errors less thane are ignored. The sparse solution is the one that gives the same error with minimum number of coefficients describingf(x). The second term in the objective function is the regularization term added to avoid the consequences of the ill-posedness of the inverse problem [6].

The Eq. (2) is solved in dual form employing Lagrange multipliers as [3]:

fðxÞ ¼X^N

i¼1

ðai aiÞKðx;xiÞ þb (3)

Theaianda_i are the Lagrange multipliers and have to greater than zero for the support vectorsi¼1,. . .,N, andKðxi;xÞis a kernel function.

There are three main parameters to be determined as part of SVM trainings; the tradeoff ‘C’, the degree of error tolerance ‘e’, and parameter related to the kernel, the kernel width, ‘g’. In practice, these may be determined either using a trial-and-error procedure or an automatic optimization method [7]. The optimization scheme can be single objective or multiobjective depending upon the nature of the problem.

The initial formulation for training SVM employed a single objective approach to optimization. It was noticed that single objective methods to train model can result in optimal parameter sets for the particular single-objective value; but fail to 6 Multiobjective Evolutionary Optimization and Machine Learning 73

provide reasonable estimates for the other objective. It is therefore preferred to employ multiobjective search procedures. In the current implementation, Multi-objective Particle Swarm Optimization (MOPSO) algorithm is used to determine the SVM parameters.

2.2 Multiobjective Evolutionary Optimization

The goal in an optimization algorithm is to find the minimum (or maximum) of a real valued functionf :S! <i.e., finding optimum parameter setx2Ssuch that:

fðxÞ fðxÞ; 8x2S (4)

WhereS <^Dis the feasible range forx(the parameter set, having D-dimen-sions) and<^Drepresents the D-dimensional space of real number.

The current multiobjective methodology employs Swarm Intelligence based evo-lutionary computing multiobjective strategy called Multiobjective Particle Swarm Optimization (MOPSO) [8]. The PSO method has been developed for single objec-tive optimization by R. C. Eberhart and J. Kennedy [9]. It has been later extended to solve multiobjective problems by various researchers including the method by [8].

The method originates from the swarm paradigm, called Particle Swarm Optimi-zation (PSO), and is expected to provide the so-called global or near-global optimum.

PSO is characterized by an adaptive algorithm based on a social-psychological metaphor [9] involving individuals who are interacting with one another in a social world. This sociocognitive view can be effectively applied to computationally intelligent systems [10]. The governing factor in PSO is that the individuals, or

“particles,” keep track of their best positions in the search space thus far obtained, and also the best positions obtained by their neighboring particles. The best position of an individual particle is called “local best,” and the best of the positions obtained by all the particles is called the “global best.” Hence the global best is what all the particles tend to follow. The algorithmic details on PSO can be found in [8, 9, 11,12]. The approach in [8] presents a multiobjective framework for SVM optimi-zation using MOPSO.

A multiobjective approach differs from a single objective method in that the objective function to be minimized (or maximized) is now a vector containing more than one objective function. The task, therefore, of the optimization method is to map out a trade-off surface (otherwise known asPareto front), unlike finding a single scalar-valued optimum in case of single objective problems. The multi-objective approach to the PSO algorithm is implemented by using the concept of Pareto ranks and defining the Pareto front in the objective function space.

Mathematically, aPareto optimal front is defined as follows: A decision vector

~x12Sis calledParetooptimal if there does not exist another~x2 2Sthat dominates it. LetP <^mbe a set of vectors. TheParetooptimal front PPcontains all vectors~x12P, which are not dominated by any vector~x22P:

74 K. Gill et al.

P¼f~x12Pj 6 9~x22P:~x2 ~x1g (5) The idea ofParetoranking is to rank the population in the objective space and separate the points with rank 1 in a set P* from the remaining points. This establishes aParetofront defined by a setP*. All the points in the setP*are the

“behavioral” points (or non-dominated solutions), and the remaining points in set Pbecome the “non-behavioral” points (or inferior solution or dominated solutions).

The reason behind using theParetooptimality concept is that there are solutions for which the performance of one objective function cannot be improved without sacrificing the performance of at least one other.

In the MOPSO algorithm, as devised in [8], the particles will follow the nearest neighboring member of theParetofront based on the proximity in the objective function (solution) space. At the same time, the particles in the front will follow the best individual in the front, which is the median of thePareto front. The term follow means assignments done for each particle in the population set to decide the direction and offset (velocity) in the subsequent iteration. These assignments are done based on the proximity in the objective function or solution space. The best individual is defined in a relative sense and may change from iteration to iteration depending upon the value of objective function.

An example test problem is shown in Fig. 1. The test function presents a maximization problem (as in [13]):

MaximizeF¼ ðf1ðx;yÞ;f2ðx;yÞÞ (6)

8.6

8.4

8.2

7.8

7.6

7.4

–4 –2 0 2 4 6 8

f1(x,y)

f2(x,y)

MOPSO Front True Front

Fig. 1 Test function results

6 Multiobjective Evolutionary Optimization and Machine Learning 75

where

f1ðx;yÞ ¼ x²þy; f2ðx;yÞ ¼x 2þyþ1 subject to

6þy6:5; 0x

2þy7:5; 05xþy30; x;y0 in the range 0x;y7. The true front and the front from the MOPSO algorithm are shown in Fig.1after 5,000 function evaluations. It can be noticed that MOPSO was able to reproduce the true front for this test case.

2.3 SVM-MOPSO Trainings

The unique formulation of MOPSO helps it to avoid getting struck in local optima, when making a search in the multi-dimensional parameter domain. In the current research, the MOPSO is used to parameterize the three parameters of SVM namely;

the trade-off or cost parameter ‘C’, the epsilon ‘e’, and the kernel width ‘g’. The MOPSO method uses a population of parameter sets to compete against each other through a number of iterations in order to improve values of specified multi-objective criteria (multi-objective functions) e.g., root mean square error (RMSE), bias, histogram error (BinRMSE), correlation, etc. The optimum parameter search is conducted in an intelligent manner by narrowing the desired regions of interest and avoids getting struck in local optima.

In the earlier efforts, a single objective optimization methodology has been employed for optimization of three SVM parameters. The approach was tested on a number of sites and results were encouraging. However, it has been noticed that using a single objective optimization method can result in sub-optimal predictions when looking at multiple objectives. The single objective formulation (using PSO) employed coefficient of determination (COD), as the only objective function, but it was noticed that the resulting distributions were highly distorted when compared to the observed distributions. The coefficient of determination (COD) is linearly related to RMSE and can range between1and 1; the value of 1 being a perfect fit. It is shown in Fig.2where a trade-off curve is presented between BinRMSE vs.

COD. It can be noticed that COD value increases with the increase in BinRMSE value. The corresponding histograms are also shown in Fig. 3 for each of the extreme ends (maximum COD and minimum BinRMSE) and the “compromise”

solution from the curve. The histograms shown in Fig.3make it clear that the best COD (or RMSE) is the one with highest BinRMSE and indeed misses the extreme ends of the distribution. Thus no matter how tempting it is to achieve best COD

76 K. Gill et al.

value it does not cover the extreme ends of the distribution. On the other hand the best BinRMSE comes at the cost of lowest COD (or highest RMSE) and is not desired either. Thus it is required to have a multiobjective scheme that simulta-neously minimize these objectives and provide a trade-off surface and therefore a compromise solution can be chosen between the two objectives. Figure 3 also shows the histogram for the “compromise” solution which provides a decent histogram when compared to observed data.

0 2 4 6 8 10 × 10⁴

0 500 1000 1500 2000 2500

Power (kw)

Counts

Observed Best BinRMS Compromise Best COD

Fig. 3 Histogram comparison between Observed, BinRMS best, COD best, and the best compro-mise solution

300 350 400 450 500 550 600 650 700

0.49 0.5 0.51 0.52 0.53 0.54 0.55

BinRMS

COD

BinRMS vs COD Best BinRMS Best COD Compromise

Fig. 2 Trade-off curve between BinRMSE and COD

6 Multiobjective Evolutionary Optimization and Machine Learning 77

3 Application

The current procedures primarily employ SVM for building regression models for assessing and forecasting wind resources. The primary inputs to the SVM come from the National Center for Environmental Prediction (NCEP)’s reanalysis gridded data [14] centered on the wind farm location. The target is the measurements of wind farm aggregate power. The current training uses a k-fold cross validation scheme referred to as “Round-Robin strategy”. The idea within the “Round-Robin” is to divide the available training data into two sets; use one for training and hold the other for testing the model. In this particular “Round-Robin strategy” data is divided into months. The training is done on all the months except one, and the testing is done on the hold-out month. The previous operational methods employ manual calibration for the SVM parameters in assessment projects and a simple grid-based parameter search in forecasting applications.

The goal in using MOPSO is to explore the regions of interest with respect to the specific multiobjective criteria in an efficient way. Another attractive feature of MOPSO is that it results in a so-calledParetoparameter space, which accounts for parameter uncertainty between the two objective functions. Thus the result is an ensemble of parameter sets cluttered around the so called global optimum with respect to the multiobjective space. This ensemble of parameter sets also gives tradeoffs on different objective criteria. The MOPSO-SVM method is tested on the data from an operational assessment site in North America. The results are compared with observed data using a number of evaluation criteria on the validation sets.

As stated above, the data from four NCEP’s grid points each consisting of five variables (total 20 variables) is used. It has been noticed that various normalization and pre-processing techniques on the data may help to improve SVM’s prediction capabilities. In the current study, input data has also been tried by pre-processing it using Principal Component Analysis (PCA). In that case, the PC’s explaining 95%

of the variance are included as inputs to SVM. The comparison is made with SVM that does not use PCA as pre-processing step.

MOPSO require a population consisting of parameter sets to be evolved through a number of iterations competing against each other to obtain an optimum (mini-mum in this case) value for the BinRMSE and RMSE. In the current formulation, a 50 member population is evolved for 100 iterations within MOPSO for wind power predictions at the wind farm.

Dans le document Lecture Notes in Electrical Engineering Volume 68 (Page 95-101)