Kernel(s) selection - Kernels, model and variables selection with MKL

6.4 Kernels, model and variables selection with MKL

6.4.3 Kernel(s) selection

Figures 6.2 and 6.3 plot the mean values of the AUC, the recall, the precision and the number of selected kernels on the training datasets averaged from the 5–folds cross–validation procedure along the regularization path (left part of the figures) and from an evaluation of the 100 models on the testing sets (right part of the figures). In all cases, the MKL exhibits an overfitting for small values of the regularization parameter. The algorithm strictly classifies all the examples in one particular class. This is highlighted by the constant value of the AUC. For these small values of the regularization parameter, the number of selected kernels is low: the mean is equal to 1 kernel for the dataset A and equal to 2 for the datasets B, C and D. The number of selected kernels increases at high values of the regularization parameter but there is a gap between the mean and the median values. This gap is higher at the points maximizing the mean AUC for all the datasets suggests the presence of a kernel having higher coefficient than the others i.e. a dominant kernel.

We have obtained a relatively similar behavior with the benchmark datasets (see Figure C.1 and Figure C.2 in the appendix B).

The respective mean value of AUC, recall and precision leap up even if the number of selected kernels is remaining stable at its minimum value but it does not happen at the same point for all the datasets. Let us call this point “inflexion point of AUC” because the geometry of the AUC curve shows a concavity change from this point. However, it is not to be amalgamated with the mathematical definition of an inflexion point. From this AUC inflection point, the AUC increases and decreases after reaching a maximum value. The maximum of the AUC is reached at a point where the number of selected kernels is up to 1 except for thedataset B. The value of the regu-larization parameter at the point maximizing the AUC is not the same for the 4 datasets. The

2Available from http://www.fml.tuebingen.mpg.de/Members/raetsch/benchmark (last visit 03/09/2010)

regularization parameter maximizing the AUC is larger with a larger number of variables and ex-amples in the training sets.

At the point maximizing the AUC, the coefficients of the kernels are provided in Table 6.2. The first six kernels are selected for thedatasets AandC, the sixth for thedataset B and the fifth and sixth for thedataset D. The sixth kernel is the most important kernel for three datasets (AandB while it’s the fifth kernel for the datasetsC andD.

Table 6.4 provides the evaluation obtained with the selected models. The number of selected kernels, the value of the regularization parameter and the performance metrics on training and testing sets are reported in this table. We analyze the table in 3 ways: 1) internal horizontal way where the results from the training phase to the testing phase are analyzed for all datasets; 2) horizontal way where the results obtained on testing sets from models built with balanced and unbalanced training sets are compared; and 3) vertical way where the results obtained on testing sets from the all variables and those obtained with filtered variables are compared.

At the point maximizing the AUC on training sets, some metrics are worse and the others im-proved when the model is evaluated on testing sets while all performance metrics are worse when evaluated on testing sets with the benchmark datasets. According to Table 6.4, for example, the mean accuracy and specificity are systematically improved on the testing sets and the mean recall is systematically worsen on all datasets. The models built with balanced training sets are opti-mistic with respect to the precision and the mean precision is always worsen on testing sets while it is systematically improved for models built with unbalanced training sets. These variations lead to systematic worsening of the mean f–measure and an improvement of the mean AUC for the datasets A and C. The opposite of this behavior is seen with the datasets B and D: systematic improvement of the mean f–measure and worsening of the mean AUC. These variations can be explained with the low level of examples in the training sets for all datasets. The skewness of the classes distribution reduces amount of examples to be used for training in the datasetsAandC.

The limitation with respect to memory constrains us to use only 35% of the examples to build models for the datasetsB andD.

The evaluation of the models on testing sets exhibits better performance with the models built with the balanced training sets. For instance, the mean recall, f–measure and AUC obtained with thedataset Aare better than those obtained with thedataset B. The later provides better accuracy due to higher specificity which is the highest specificity obtained with the 4 datasets. Thedataset C provides higher recall and AUC compared to thedataset D which exhibits higher accuracy due to a relatively higher specificity. The f–measure is always better with the datasets having the original imbalance rate but the AUC is always better with the balanced datasets.

The use of filtered variables in the datasets C and D is effective according to Table 6.4. This shows the effectiveness of the feature selection we applied to the nosocomial infection dataset. For instance, thedataset C provides a higher recall, f–measure and AUC compared to thedataset A.

The performance obtained with thedataset D outperforms all performance metrics obtained with thedataset B. However the performance obtained with the dataset C is better than the results obtained with the dataset D and it is confirmed by a McNemar test with the Bonferoni adjust-ment. A similar test also indicates that the results obtained with the dataset D is better than those from the dataset A which is better than the model built on the dataset B. However, the none of the results we obtain in this section outperforms the results we obtained with a single RBF kernel whose parameters were optimized using the radius–margin bound on the leave–one–out error.

With respect to the evaluation of the 100 models of the regularization path (right side of the figures 6.2 and 6.3), mean AUC is not maximized at the same value of the regularization parame-ter maximizing the AUC during the training phase. The model evaluation exhibits overfitting for very small values of the regularization parameters and the best performances are achieved at a

6.4. KERNELS, MODEL AND VARIABLES SELECTION WITH MKL 75 lower value of the regularization parameter than during the training phase. The “AUC inflection point” is pushed on the left i.e. on lower values of the regularization parameter. Furthermore, the number of selected kernels at these models are not the same as in the best model from the training set. Table 6.3 highlights the domination of one RBF kernel (σ= 5) for all the datasets.

The performance values at these points are summarized in Table 6.5. The performances shown in these tables outperform all the results obtained with the same datasets at the point maximizing the AUC of the training phase. According to the AUC, the model built with filtered variables and having the original data skewness (dataset D) provides the best results. This results obtained with this dataset also outperforms the results obtained with a single RBF kernel whose parameters were optimized using the radius–margin bound on the leave–one–out error.

Table 6.2: Mean of the weight of each kernels at the point maximizing the AUC after the 5–folds cross–validation

σ 0.03125 0.125 0.5 1 2 5 10 15 20 25 50 75 100

dataset A 0.0319 0.1189 0.1107 0.1145 0.1327 0.4913 0 0 0 0 0 0 0

dataset B 0 0 0 0 0 1 0 0 0 0 0 0 0

dataset C 0.0134 0.0135 0.0184 0.0016 0.8590 0.0941 0 0 0 0 0 0 0 dataset D 0.0188 0.0347 0.0253 0.0210 0.6411 0.2591 0 0 0 0 0 0 0

Table 6.3: Mean of the weight of each kernels at the point maximizing the AUC on test sets

σ 0.03125 0.125 0.5 1 2 5 10 15 20 25 50 75 100

dataset A 0 0 0 0 0 1.0000 0 0 0 0 0 0 0

dataset B 0 0 0 0 0 0.9200 0.0800 0 0 0 0 0 0

dataset C 0 0 0 0 0.0800 0.9200 0 0 0 0 0 0 0

dataset D 0 0 0 0 0.0800 0.9200 0 0 0 0 0 0 0

Table 6.4: Performance on the training data and on 10 testing sets at the point maximizing the AUC of the training sets during the cross–validation. Thirteen (13) RBF kernels were applied to the dataset.

dataset A dataset B

training set testing set training set testing set

Selected kernels 6/13 1/13

Cost 155.5676 77.0866₋ 550.9425+

Accuracy 72.02±8.26 83.75±2.18 77.69±3.58 85.24±1.41 Recall 74.97±7.75 63.70±6.62 76.53±10.42 58.67±5.91 Specificity 69.07±12.51 86.44±2.94 77.86±4.13 88.77±1.55 Precision 71.50±9.48 38.98±4.84 32.88±4.61 41.11±3.66 F-measure 73.19±8.53 47.99±3.56 46.00±6.39 48.22±3.75 AUC 72.02±8.26 75.07±2.61 77.19±5.17 73.72±2.77

dataset C dataset D

training set testing set training set testing set

Selected kernels 6/13 6/13

Cost 97.7010 31.6428₋ 246.6131+

Accuracy 78.09±7.27 81.85±2.33 76.80±5.55 85.56±1.36 Recall 88.11±7.38 74.39±6.40 77.64±8.73 62.40±6.57 Specificity 68.07±11.61 82.88±2.80 76.74±7.17 88.64±1.69 Precision 73.96±7.59 36.94±5.19 32.89±6.06 42.36±4.13 F-measure 80.42±7.48 49.10±4.99 46.21±7.16 50.28±4.12 AUC 78.09±7.27 78.63±3.05 77.19±2.87 75.52±3.01

6.4. KERNELS, MODEL AND VARIABLES SELECTION WITH MKL 77

Table 6.5: Performance on the training data and on 10 testing sets at the point maximizing the AUC of the training sets during the cross–validation. Thirteen (13) RBF kernels were applied to the dataset.

dataset A dataset B

training set testing set training set testing set

Selected kernels 6/13 1/13

Cost 155.5676 77.0866₋ 550.9425+

Accuracy 57.52±0.00 83.83±2.18 44.92±0.00 83.29±1.67 Recall 53.96±0.00 66.05±6.51 59.80±0.00 75.27±3.65 Specificity 61.08±0.00 86.21±2.81 42.08±0.00 84.37±2.13 Precision 16.02±0.00 39.38±4.97 12.25±0.00 39.24±3.64 F-measure 24.70±0.00 49.01±4.01 20.34±0.00 51.44±2.93 AUC 57.52±0.00 76.13±2.79 50.94±0.00 79.82±1.52

dataset C dataset D

training set testing set training set testing set

Selected kernels 6/13 6/13

Cost 97.7010 31.6428₋ 246.6131+

Accuracy 73.69±0.00 88.23±1.69 69.82±0.00 87.56±1.10 Recall 68.23±0.00 75.41±7.34 24.00±0.00 78.67±3.29 Specificity 79.14±0.00 89.93±2.34 76.00±0.00 88.77±1.46 Precision 78.11±0.00 50.45±5.36 11.73±0.00 49.03±4.24 F-measure 72.84±12.48 60.08±4.05 15.76±0.00 60.41±3.70 AUC 73.69±0.00 82.67±3.08 50.00±0.00 83.72±1.41

Figure 6.2: Performance measure (AUC, recall, precision) and the ratio of selected kernels (SKER) during the 5–folds cross–validation procedure on 5 random training sets (left) and during the evaluation on 10 couples of training/testing set (right) for the datasets A(top) and B (bottom) and according to the 100 values ofC. The 13 RBF kernels are applied to the whole dataset.

6.4. KERNELS, MODEL AND VARIABLES SELECTION WITH MKL 79

Figure 6.3: Performance measure (AUC, recall, precision) and the ratio of selected kernels (SKER) during the 5–folds cross–validation procedure on 5 random training sets (left) and during the evaluation on 10 couples of training/testing set (right) for thedatasets C (top) and D (bottom) and according to the 100 values ofC. The 13 RBF kernels are applied to the whole dataset.

Dans le document Clinical data mining with Kernel-based algorithms (Page 86-93)