Classification performance - Clinical data mining with Kernel-based algorithms

4.4 Experiments

4.4.5 Classification performance

Tables 4.3, 4.4, 4.5, 4.6, 4.7 and 4.8 summarize the classification results obtained using the three models. AUC obtained with the various techniques are summarized in Figure 4.1. The best A^geomvalue of 0.752 was obtained using the ASVM+RES approach. It is followed by ASVM with A^geom= 0.749 and worst performance is obtained with BAL withA^geom= 0.746.

Discussion

The convergence of the four initialization points to the same region indicates a consistency of the use of the RMB for SVM hyperparameters estimation. The computation time depends primar-ily on the size of the training set. During the model selection process i.e. LOPO cross–validation, the SVMs were run only 2 times per fold and the algorithm converged after 7 to 18 iterations.

4.4. EXPERIMENTS 51

Table 4.5: Performance on fibrosis vs. all classes.

Fibrosis BAL ASVM ASVM + RES

Error 0.21 0.19 0.20

Sensitivity 0.80 0.77 0.79 Specificity 0.79 0.82 0.80

Precision 0.50 0.53 0.51

F–measure 0.61 0.63 0.62

AUC 0.79 0.79 0.80

Table 4.6: Performance onground glass vs. all classes.

Ground glass BAL ASVM ASVM + RES

Error 0.46 0.46 0.45

Sensitivity 0.86 0.87 0.88

Specificity 0.48 0.48 0.5

Precision 0.23 0.23 0.24

F–measure 0.36 0.37 0.38

AUC 0.67 0.68 0.69

Table 4.7: Performance onhealthy vs. all classes.

Healthy BAL ASVM ASVM + RES

Error 0.24 0.23 0.23

Sensitivity 0.95 0.95 0.95 Specificity 0.71 0.72 0.72

Precision 0.45 0.46 0.46

F–measure 0.61 0.62 0.62

AUC 0.83 0.83 0.84

Table 4.8: Performance onmicronodules vs. all classes.

Micronodules BAL ASVM ASVM + RES

Error 0.31 0.32 0.30

Sensitivity 0.55 0.53 0.45 Specificity 0.79 0.79 0.87

Precision 0.64 0.63 0.69

F–measure 0.59 0.57 0.54

AUC 0.67 0.66 0.66

Table 4.9: Ranking of the three strategies according to the AUC values.

BAL ASVM ASVM+RES

fibrosis 1 2 3

ground glass 1 2 3

healthy 1 2 3

micronodules 3 2 1

Total ranking 6 8 10

With respect to the multi–class performance, the geometric mean ranked the third strategy (ASVM+RES) as the best choice for the lung tissue classification especially for thefibrosis,ground glassandhealthy tissues (see for example Figure 4.1). The McNemar statistical test on each pair of these strategies indicates that there are no significant differences in the results obtained with the three strategies for the classification ofconsolidation andemphysema. The McNemar test also indicates no signif-icant difference between ASVM and ASVM+RES for the classification ofhealthy tissue. Table 4.9 summarizes the ranking of the three strategies according to the value of the AUC for four tissue types.

The choice of the AUC to rank the strategies was taken because it takes into account the sen-sitivity and the specificity of the classifier i.e. the ratio of true positive and true negative cases.

Depending on the final use of the classification models, the ranking in Table 4.9 may not hold anymore. For instance, if the f–measure was used to rank the three strategies as in Table 4.9, the ASVM and ASVM+RES strategies would have the same rank. A ranking according to the f–measure would agree with a ranking based on AUC except for fibrosis where ASVM has the highest f–measure (i.e. would be ranked as the best) even if it has the lowest sensitivity.

The BAL strategy outperforms the other strategies for the classification of micronodules. The asymmetric–margin of the SVM altered the sensitivity of the model. The addition of synthetic majority examples combined with an asymmetric–margin SVM has further deteriorated the sensi-tivity of the model but improved the precision and the specificity of the classifier. This is due to the existence of outliers in themicronodules examples, which were misclassified with the ASVM and ASVM+RES and thus decreasing the sensitivity and the precision.

The results obtained in the classification ofhealthytissue is of particular interest in clinical practice where the three models have sensitivity equal to 95%. Indeed, an ASVM model for the classifica-tion of healthy tissue can be used to detect abnormal tissue types in HRCT.

Table 4.3 highlights the question of using the accuracy as a performance metric in the classifi-cation of imbalanced data sets. According to this table, the accuracy is around 98% but the algorithm correctly classified 99% of the negative examples while the sensitivity and precision on positive examples are low.

The precision for the classification of consolidation, emphysema and ground glass are very low and more investigations are needed to improve the results. This is probably due to the characteris-tics of these tissues: they are not very specific according to a discrimination measure of these tissues against the rest. The latter was carried out with the Rayleigh quotient (ratio of the between–class variance and the within–class variance) on the dataset projected into the principal component axis.

Figure 4.2 highlights the high disparity of the consolidation in the principal components space and the existence of three clusters for this class of tissue.

Many investigations will be carried out in the future to improve the classification of the three lung tissue types cited in the previous paragraph. Increasing the number of examples of these tissue types may provide better improvement of the classification results. Theconsolidationtissue, for examples, is from younger patients compared to the other patients in the database. Another avenue for future investigation is to play with the oversampling ratio. The computational speed of the gradient descent method (compared to a fine–grained grid–search) and the stability of the model will allow us to carry out more experiments with respect to the oversampling ratio of the minority classes. The addition of clinical features may also increase the performance of the classi-fication. The use of kernel–based algorithm to transform the input space into a linearly separable dataset did not provide good results and it is possible that the RBF kernel is not well-suited for the classification of these tissues. Other approaches allowing the selection of the appropriate kernel also figures in the list of future work [115].

4.4. EXPERIMENTS 53

healthy emphysema ground glass fibrosis micronodules consolidation

AUC

No diff.

BAL ASVM ASVM + RES

Figure 4.1: AUC values obtained with the various techniques. The horizontal bars at the top of the histogram indicate if the two results has no significant difference according to the McNemar statistical test.

Figure 4.2: Projection of theconsolidation vs all dataset on the first three principal component axes. Theconsolidation examples are labelled as +1.

4.5 Summary

The speed of the SVM implementing the radius–margin bound for model selection is very at-tractive for a classification task. This specificity allows to carry out many experiments and is of particular interest in a daily use. However, the concept drift due to effective preventive measures for NI may hinder the deployment of such an algorithm. Indeed, the infection control practitioners should provide training examples to tune the algorithm. Much investigation should be carried out to determine how many times and model could remain valid before the infection control practi-tioners collect these new training examples. We have used the majority–class under–sampling to handle the dataset imbalance. The effectiveness of this method shows that the infection control practitioners do not have to provide many training examples to re–train a new model.

The integration of prior knowledge concerning the cases prevalence in the TALISMAN database followed by the use of the SVM with asymmetrical cost improved the classification of ILD block instances significantly. This approach for class imbalance management is innovative because the over–sampling does not operate only the minority class such as with SMOTE. The adaptation of the algorithm to handle imbalanced dataset was the clue of this approach. There are many avenue for improvement on the classification of ILD tissues. One can cite, for example, the integration of clinical parameters with the imaging features. However, such dataset may need algorithm which can trade–off between the two types of features such as the multiple kernel learning that we develop in Chapter 6.

Chapter 5

Variable relevance validation

5.1 Introduction

Choosing the most relevant variables with the same or better performance results is the purpose of a feature or variable selection process [56, 143]. The feature selection reduces the time and mem-ory requirements to handle large datasets [105]. Having fewer variables involved in a classification task also increases the interpretability of classification results [130]. The SVM algorithm uses the dot–product of each pair of training examples to build a model in a higher dimensional space.

For this reason, the performance of the SVM is not really affected by the number of variables defining the dataset. Kohavi and John also points out that the question of which variables to use in a particular classification problem is not of much interest from the theoretical standpoint [86].

However, from a practical viewpoint, performing a variable selection with SVMs presents interest with respect to computational efficiency and interpretability.

Finding the most relevant subset of variables is a combinatorial problem. To overcome the expen-sive computation of such a problem, heuristics were proposed and one can cite, for example, the heuristic consisting of gradually adding or removing variables in a forward or backward scheme [61].

Such a method was introduced earlier in this thesis using the SVM–RFE. The algorithm gives a ranking of the variables according to their predictive power when combined with other variables but do not indicates which subsets give the best performance. We have chosen theχ² statistical test to select a subset of most relevant variables and evaluated its performance with the FLD algorithm.

In this chapter, we focus on the analysis of the variables of the NI dataset. Two reasons mo-tivated this investigation. On the one hand, the historical evaluation of the NI dataset exhibits a deterioration of the classifier sensitivity. On the other hand, the computational cost gained using the bounds on the generalization error for model selection allows to carry out an evaluation of multiple variable subsets.

The remaining parts of the chapter are organized as follows. The formal definition of the variable selection problem is presented in Section 5.2. A review of different methods for variable selection with the SVM algorithm is introduced in Section 5.3. We focus on the embedded methods as the validation we carry out here mimics their functionality to some extend. The evaluation of the relevance of the NI variable is carried out in Section 5.4.

Dans le document Clinical data mining with Kernel-based algorithms (Page 63-68)