Model interpretability - Kernels, model and variables selection with MKL

6.4 Kernels, model and variables selection with MKL

6.4.4 Model interpretability

In the second experiment, the 13 RBF kernels were applied to each variable. The objective of this experiment is two-fold: 1) analyze the behavior of each variable during a cross-validation procedure, and 2) rank the variables with the model providing the highest values of the AUC.

Figures 6.4 and 6.5 show the behavior of the different metrics along the regularization path from the cross–validation procedure (left side of the figures) and from the evaluation on the 100 testing samples (right side of the figures). For small values of the regularization parameter, the AUC and the number of selected kernels are stable for all datasets. This stable values of the AUC indicates the classification of all the dataset in one particular class. All the examples are classified in the negative class with the balanced training sets and strictly in the positive class for the training sets with the original data skewness. As in the previous kernel selection experiment, we have the AUC inflexion point. Around this later, the AUC and the precision leap up to achieve relatively a stable value. We have obtained a relatively similar behavior with the benchmark dataset (see the left side of Figure C.3 and Figure C.4 in the appendix B). The number of selected kernels increases for high values ofC but the distribution of the values of the kernels’ coefficient presents a negative asym-metry according to the gap between their mean and median values. In other words, the evolution of the ratio of selected kernels indicates the presence of dominant kernels. The behavior of the AUC, recall and precision with thedataset A(resp. B) are relatively the same as with thedataset C (resp. D) during the training phase. However, the regularization parameters maximizing the AUC are not the same for thedatasets B andD.

Figure 6.4 and Figure 6.5 also depict the performance evaluation along the regularization path (right side of the figures). In all cases, the algorithm exhibits overfitting for very small values ofC.

After the AUC inflection point, the maximum of AUC is achieved and kept constant independently to the number of selected kernels and the value of the regularization parameter. The recall and precision also have constant values after the AUC inflection point.

Table 6.6 summarizes the results obtained at the point maximizing the AUC from the model selection process and from the evaluation on the testing sets. The number of kernel per variable having non null coefficients are respectively 198/767, 2/767, 67/247 and 52/247 for thedatasets A, B, C andD. From the training to the evaluation on testing sets, all performance metrics increases for the datasets A and C except the precision and thus the f-measure. For the datasetsB and D, all performance metrics increases except for the specificity with thedataset B. The evaluation on the test set provides exactly the same results for all datasets. Compared with the results from the previous kernel(s) selection experiments, only the models at the point maximizing the testing AUC with thedataset D outperform these results. These results also outperform those obtained with single RBF kernel whose parameters were optimized using the radius–margin bound on the leave–one–out error.

To address the interpretability of the models, the values of the coefficients of kernels are analyzed using cross-tabulations of RBF kernels×nosocomial infection variables. Due to space limitations, only the summaries of these tables for 2 particular points are presented in this document. Tables 6.7 and 6.8 present respectively the total sum of the coefficients of each RBF kernel according to all variables at the regularization parameter maximizing the AUC from the training phase and at the regularization parameter providing the highest number of selected kernels (C= 10³ in our case). Tables A.2 and A.3 in appendix A present respectively the total sum of the coefficients of each variables according to each kernel at the regularization parameter maximizing the AUC from the cross–validation procedure and at the regularization parameter providing the highest number of selected kernels (C= 10³ in our case). For an intuitive view of the evolution of the coefficients, the evolution of total sum of kernels’ coefficients for each variable and from the model selection procedure is depicted in Figure 6.6. This figure shows the dominance of one or two variables espe-cially when the regularization parameter has small values.

6.4. KERNELS, MODEL AND VARIABLES SELECTION WITH MKL 81 At the point maximizing the mean AUC of the training sets, all kernels are active with thedataset Aand the 10-th kernel has the highest coefficient value. Only 7 variables has null coefficient and the variable 45 has the highest coefficient. For thedataset B the 10-th RBF kernel (σ= 25) is the unique active kernel and 2 variables has not null coefficient (variables 45 and 46). For the datasets C and D, we have sparse linear combination of the kernels and the 7-th RBF kernel (σ = 10) has the highest coefficient values. All the variables have not null coefficient but the cumulated sum of these coefficients are not the same in the two datasets. If the variable 13 have highest cumulated sum of coefficients in both dataset, the variable 38 is ranked at the second position with thedataset C while it is the variable 45 for thedataset D. The most important kernel is the same for the datasetAandB (resp C andD) but the most important variables are not the same. At the point providing the maximal number of selected kernels, all the RBF kernels are active with the domination of a particular one and the cumulated sum of the coefficients of the variable and thus their ranking change except for the one having the highest cumulated sum.

Table 6.6: Performance on the training data at the point maximizing the AUC during the cross–

validation. Thirteen (13) RBF kernels were applied to each variable.

dataset A dataset B

training set testing set training set testing set

Selected kernels 198/767 2/767

Cost 109.7499 3.334₋ 8.711+

Accuracy 81.63±5.12 84.15±0.92 84.04±1.55 84.15±0.92 Recall 79.06±7.93 81.77±2.95 78.13±5.55 81.77±2.95 Specificity 84.21±9.89 84.46±1.12 84.84±1.25 84.46±1.12 Precision 84.35±8.25 41.84±3.20 41.76±4.35 41.84±3.20 F-measure 81.62±8.08 55.35±3.07 54.43±4.88 55.35±3.07 AUC 81.63±5.12 83.11±1.38 81.49±3.17 83.11±1.38

dataset C dataset D

training set testing set training set testing set

Selected kernels 67/247 52/247

Cost 109.7499 42.3156₋ 270.2559+

Accuracy 81.95±5.08 84.15±0.92 83.84±1.76 84.15±0.92 Recall 79.69±7.96 81.77±2.95 79.53±4.30 81.77±2.95 Specificity 84.21±9.89 84.46±1.12 84.43±1.89 84.46±1.12 Precision 84.46±8.21 41.84±3.20 41.76±4.08 41.84±3.20 F-measure 82.01±8.09 55.35±3.07 54.77±4.19 55.35±3.07 AUC 81.95±5.08 83.11±1.38 81.98±2.40 83.11±1.38

Figure 6.4: Performance measure (AUC, recall, precision) and the ratio of selected kernels (SKER) during the 5–folds cross–validation procedure on 5 random training sets (left) and during the evaluation on 10 couples of training/testing set (right) for the datasets A(top) and B (bottom) and according to the 100 values ofC. The 13 RBF kernels are applied to each variable.

6.4. KERNELS, MODEL AND VARIABLES SELECTION WITH MKL 83

Figure 6.5: Performance measure (AUC, recall, precision) and the ratio of selected kernels (SKER) during the 5–folds cross–validation procedure on 5 random training sets (left) and during the evaluation on 10 couples of training/testing set (right) for thedatasets C (top) and D (bottom) and according to the 100 values ofC. The 13 RBF kernels are applied to each variable.

Table 6.7: Total sum of the coefficients of each RBF kernel at the point maximizing the AUC

σ 0.03125 0.125 0.5 1 2 5 10 15 20 25 50 75 100

dataset A 0.0463 0.0013 0.0001 0.0112 0.0319 0.0001 0.0219 0.0044 0.005 0.8349 0.0054 0.0117 0.0257

dataset B 0 0 0 0 0 0 0 0 0 1 0 0 0

dataset C 0.0647 0 0.0024 0 0.0014 0.0714 0.8232 0.0021 0.0062 0 0 0.0142 0.0142 dataset D 0.0434 0 0.0146 0.0054 0 0.1117 0.7929 0 0.0049 0.0002 0 0.0096 0.0172

Table 6.8: Total sum of the coefficients of each RBF kernel at the highest SKER

σ 0.03125 0.125 0.5 1 2 5 10 15 20 25 50 75 100

dataset A 0.0911 0.0562 0.031 0.0595 0.0936 0.0249 0.1156 0.0298 0.0321 0.279 0.0481 0.0569 0.0822 dataset B 0.1088 0.0343 0.0254 0.0227 0.0104 0.0068 0.0099 0.0229 0.0235 0.5958 0.02 0.0454 0.0739 dataset C 0.18 0.0193 0.0464 0.0192 0.0158 0.0719 0.4954 0.0209 0.0417 0.0023 0.0203 0.0361 0.0306 dataset D 0.1062 0.0025 0.0275 0.003 0.002 0.1522 0.6427 0.0021 0.0026 0.0024 0.003 0.0291 0.0248

0 5 10 15 20 25 30 35 40 45 50 55

Figure 6.6: Evolution of the total sum of coefficients per variable along the regularization path and during the 5–folds cross–validation. The 13 RBF kernels are applied on each variable. The x-axis represents the variable number and the y-axis represents the index of the regularization parameter.

Dans le document Clinical data mining with Kernel-based algorithms (Page 93-98)