Selection of the Antecedent and Consequent Variables

Clustering for Fuzzy Model Identification – Regression

3.3 TS Fuzzy Models for Nonlinear Regression

3.3.4 Selection of the Antecedent and Consequent Variables

Using too many antecedent and consequent variables results in difficulties in the prediction and interpretability capabilities of the fuzzy model due to redundancy, non-informative features and noise. To avoid these problems in this section two methods are presented.

Selection of the Consequent Variables by Orthogonal Least Squares Method As the fuzzy model is linear in the parametersθi, (3.14) is solved by least squares method (see (3.42)) that can be also formulated as:

θi=B⁺y

β_i (3.67)

whereB⁺ denotes the Moore-Penrose pseudo inverse ofΦ_e β_i.

The OLS method transforms the columns ofBinto a set of orthogonal basis vectors in order to inspect the individual contribution of each rule. To do this Gram-Schmidt orthogonalization ofB=WAis used, whereWis an orthogonal matrix W^TW = I and A is an upper triangular matrix with unity diagonal elements. Ifw_i denotes theith column ofWandgi is the corresponding element of the OLS solution vectorg=Aθi, the output variance (y√

β_i)^T(y

β_i)/N can be explained by the regressorsnr

i=1giw^T_i w_i/N. Thus, the error reduction ratio,

̺, due to an individual ruleican be expressed as

̺ⁱ= g_i²w^T_iw_i (y√

β_i)^T(y

β_i). (3.68)

This ratio offers a simple mean for ordering the consequent variables, and can be easily used to select a subset of the inputs in a forward-regression manner.

Selection of the Scheduling Variables based on Interclass Separability

Feature selection is usually necessary. For this purpose, we modify the Fisher interclass separability method which is based on statistical properties of the data and has been applied for feature selection of labeled data [231]. The interclass separability criterion is based on theF_B between-class and theF_W within-class covariance matrices that sum up to the total covariance of the training dataF_T, where:

FW = c i=1

p(ηi)Fi, FB = c i=1

p(ηi) (vi−v0)^T(vi−v0)

v0= c i=1

p(ηi)vi. (3.69)

The feature interclass separability selection criterion is a trade-off between F_W andF_B:

J = detFB

detF_W. (3.70)

The importance of a feature is measured by leaving out the feature and calculating Jfor the reduced covariance matrices. The feature selection is made iteratively by leaving out the least needed feature.

Example 3.4. Automobile MPG prediction based on Orthogonal Least Squares and Interclass Separability methods

The presented fuzzy modelling approach is applied to a common benchmark problem, the Automobile MPG (miles per gallon) prediction case study [130] (see Example3.3 for more information).

A TS-fuzzy model, that utilizes all the five information profile data about the automobiles, has been identified by the presented clustering algorithm. The inputs are:u1: Displacement,u2: Horsepower,u3: Weight,u4: Acceleration andu5: Model year. Originally, there are six features available. The first, the number of cylinders, was neglected because the clustering algorithm cannot handle discrete features with only four values[1,2,3,4].

Fuzzy models with two, three or four rules were identified with the presented method. The two rule-model gave almost the same performance on training and test data as a three and four rule-model and is therefore applied in the sequel.

Whenxk =φ_k =uk, the presented clustering method gave RMSE values of2.72 and2.85training and test data.

The FMID Toolbox gives very similar results; RMSE values of 2.67 and2.95 for train and test data. Bad results where obtained by the ANFIS algorithm when a neuro-fuzzy model with the same complexity was identified. The application of the ANFIS algorithm resulted in an overtrained model that had a RMSE performance of 1.97 on the training data but 91.35 on the test data. These results indicate that the presented clustering method has good generalization properties. For fur-ther comparison we also give the results of linear regression given by [130]. The linear model was based on seven parameters and six input variables(the previously presented five variables and the number of cylinders). The training and test RMS errors of this model were3.45and3.44, respectively.

The above presented model reduction techniques, OLS and FIS, are now applied to remove redundancy and simplify the model. First, based on the obtained fuzzy clusters, the OLS method is used for ordering of the input variables. The fuzzy model with two rules is applied. It turned out that for both clusters(local models) the “model year” and the “weight” become the most relevant input variables. With-out re-estimation(re-clustering)of the model and additional removal of the other variables from the consequent part of the model, the modelling performance drops only to 3.75 and 3.55 RMSE for the training and the test data, respectively. It is worth noting, that the same features turned out to be the most important in [130], where a cross-validation based method has been used for input selection of neuro-fuzzy models.

Based on the above considerations, the clustering has been applied again to identify a model based on the two selected attributes “weight” and “model year”.

This results in RMSE values of2.97 and2.95 for training and test data, respec-tively. In addition, the Fisher Interclass separability method is applied and the second attribute “model year” could be removed from the antecedent of both rules.

The final result is a model with RMSE values of2.97 and 2.98 for training and test data, respectively. The other methods are now applied with the same attributes and their performances are summarized in Table3.2.

Because the obtained clusters are axis-parallel, they can be analytically pro-jected and decomposed to easily interpretable membership functions defined on the individual features. The obtained fuzzy sets are shown in Figure3.15.

Table 3.2: Performance of the identified TS models with two rules and only two input variables. HYBRID: Clustering + OLS + FIS, GG: Gath–Geva clustering, ANFIS: neuro-fuzzy model, FMID: Fuzzy Model Identification Toolbox

Method train RMSE test RMSE

HYBRID 2.97 2.95

GG 2.97 2.97

ANFIS 2.67 2.95

FMID 2.94 3.01

15000 2000 2500 3000 3500 4000 4500 5000

0.2 0.4 0.6 0.8 1

weight

membership

70 72 74 76 78 80 82

0 0.2 0.4 0.6 0.8 1

Model year

membership

A1,1

A1,2 A2,2

A2,1

Figure 3.15: Membership functions obtained by the projection of the clusters.

The clusters are much more separated on the input “weight”. Hence, the ap-plication of Fisher Interclass separability shows that it is enough to use only this input on the antecedent part of the model:

R1: IfW eightisAi,1 thenMPG=

[−0.0107 1.0036]^T[W eight, Y ear]^T−23.1652, [0.57] (3.71) R2: IfW eightisAi,2 thenMPG=

[−0.0038 0.4361]^T[W eight, Y ear]^T −1.4383, [0.43]

Concluding, the obtained model has good approximation capabilities, similar to some other advanced methods, but is also very compact. Furthermore, the method seems to have good generalization properties because results on learning data are similar to those on training data. If the prediction surface of the obtained model is compared to the prediction surface of the ANFIS model(see[130]), one can see that

the ANFIS approach spuriously estimates higher MPG for heavy cars because of lack of data due to the tendency of manufacturers to begin building small compact cars during the mid 70’s, while the presented approach does not suffer from this problem.

3.3.5 Conclusions

We discussed the structure identification of Takagi–Sugeno fuzzy models and fo-cused on methods to obtain compact but still accurate models. A new clustering method is presented to identify the local models and the antecedent part of the TS-fuzzy model. In addition two methods for the selection of the antecedent and consequent attributes have been presented. Those methods allow for the deriva-tion of very compact models through subsequent ordering and removal of redun-dant and non-informative antecedent and consequent attributes. The approach is demonstrated by means of the identification of the MPG benchmark problem. The results show that the presented approach is able to identify compact and accu-rate models in a straightforward way. This method is attractive in comparison with other iterative schemes, like the one proposed in [230] that involves extensive intermediate optimization.

Dans le document Cluster Analysis for Data Mining and (Page 129-133)