Select K Feature Selection - Model Comparison using 5-fold Cross Validation

5.1 Model Comparison using 5-fold Cross Validation

5.1.9 Select K Feature Selection

Figure 5.10 shows the comparison of the models for the Select K-based feature selection method.

Random Forest, XGBoost and SVM (Linear and Radial) models have performed well with an average accuracy of 98% except for the MLP. MLP does not perform well across various folds and not performing well as compared to other models.

Figure 5.6 Comparison of models for Select K-based feature selection method

As it can be seen from Figures 5.1 – 5.10, 6 out of 10 models are performing well across the various feature selection technique. K-Nearest-Neighbour (KNN), Decision Tree and MLP do not perform well as compared to other models. Decision Tree has shown average performance throughout the process as well as KNN. Naïve Bayes, Xgboost, Random Forest, ANN, SVM (Linear and Radial) are found to be the good models across the various feature selection methods. After analysing all the results, we can conclude that Random Forest gives the best performance with each and every feature selection method. For better analysis, we further looked in more detail and the models and feature selection methods are sorted based on the AUC, accuracy, specificity and sensitivity across the 5-folds. Table 5.4 shows the Model, Feature Selection Method with Accuracy, AUC, specificity and sensitivity to show the comparison of the different models and feature selection methods.

Table 5.74. Comparison of Models and Feature Selection Techniques

From Table 5.4, it can be observed that Random Forest is one of the best models that performed outstanding with almost each feature selection method with an accuracy of 100% and AUC score 1.0. On the other hand, ANN only performed very well with Chi-Square, otherwise average with the other methods. Radial and Linear SVM models are leading the second position in providing better AUC and accuracy with all the different feature selection methods. Naïve Bayes and XGBoost are also good models specially with the ReliefF and RFE. Multilayer Perceptron seems to be an average model with all the methods as its accuracy and AUC score is around 0.60 with SelectK and Chi-Square especially. Furthermore, RNN provides better score with all the feature selection methods but performed poor with the PCA with an average accuracy of 50% accuracy and AUC score 0.50.

5.2 Model comparison based on training-to-test ratios

In Section 5.1 5-fold cross validation was used to test and compare all the models to check for overfitting. In this section we present the results by taking different training-to-test ratios.

5.2.1. CFS Feature Selection

Table 5.5 shows the results of the models for the CFS feature selection method by selecting different percentages of features and training the models on 90%, 80%, 70% and 60% data. The following observations can be made from Table 5.5.

1. The highest AUC and Accuracy values are obtained by training the model with 80% or 90% data in most of the cases.

2. The highest Accuracy for CFS feature selection method is obtained with Linear SVM, Random Forest, Naïve Bayes, and XGBoost for 100% of the features selected from all data. An accuracy of

100% is achieved with these classifiers. Random Forest performs the best as it avoids overfitting and gives 100% accuracy for all training ratios. We also observe that with 50% features Linear SVM gives an accuracy of 99.99%.

3. The above algorithms also give AUC values of 1.00. Additionally, Radial SVM and RNN also perform very well.

Table 5.5 Model Comparison based on CFS Feature Selection method

Accuracy Accuracy Accuracy Accuracy AUC AUC AUC AUC Senstivity Senstivity Senstivity Senstivity Specificity Specificity Specificity Specificity

split 60-40 70-30 80-20 90-10 60-40 70-30 80-20 90-10 60-40 70-30 80-20 90-10 60-40 70-30 80-20 90-10

Model Features

Artificial Neural Network 100% 0.5 0.5 0.5 0.5 0.8678 0.5 0.5 0.7115 0 0 0 0 1 1 1 1

Decision Tree 100% 0.95192 0.95833 0.97596 0.97115 0.9519 0.9583 0.976 0.9712 1 1 1 1 0.90385 0.91667 0.95192 0.94231 KNN 100% 0.88292 0.91026 0.85577 0.97115 0.9751 0.9916 0.982 0.9952 0.82353 0.84615 0.75 1 0.94231 0.97436 0.96154 0.94231

Linear SVM 100% 0.99279 0.99679 1 1 0.9986 1 1 1 1 1 1 1 0.98558 0.99359 1 1

MultiLayer Perceptron 100% 0.39904 0.89103 0.5 0.5 0.4792 0.8439 0.5 0.5577 0 1 0 0 0.79808 0.78205 1 1

Naive Bayes 100% 0.95192 0.99679 0.99038 1 0.9639 0.9963 0.9934 1 1 1 1 1 0.90385 0.99359 0.98077 1

Radial SVM 100% 0.98077 0.99359 0.99038 0.99038 0.9989 1 1 1 1 1 1 1 0.96154 0.98718 0.98077 0.98077

Random Forest 100% 0.9976 1 1 1 1 1 1 1 1 1 1 1 0.99519 1 1 1

Recurrent Neural Network 100% 0.95376 0.99679 0.99519 0.99038 0.996 1 0.9952 1 0.94118 1 1 1 0.96635 0.99359 0.99038 0.98077

XGBoost 100% 0.95136 0.88141 0.99519 1 0.9888 0.9911 1 1 0.94118 0.76923 1 1 0.96154 0.99359 0.99038 1

Artificial Neural Network 90% 0.499696 0.499997 0.499727 0.499607 0.867228 0.499136 0.499861 0.711073 0 0 0 0 0.99905 0.999627 0.999592 0.99931 Decision Tree 90% 0.951071 0.957714 0.975277 0.970618 0.951262 0.957306 0.975909 0.970394 0.999155 0.9991 0.9998 0.999613 0.903773 0.915712 0.951371 0.94193 KNN 90% 0.88196 0.910163 0.854924 0.970313 0.97454 0.99074 0.981241 0.995092 0.823523 0.845596 0.749856 0.999308 0.941604 0.974298 0.961479 0.942004 Linear SVM 90% 0.992179 0.99674 0.999724 0.999293 0.997644 0.999634 0.999076 0.999614 0.999782 0.999884 0.999721 0.999809 0.985263 0.993438 0.999389 0.999299 MultiLayer Perceptron 90% 0.398325 0.890321 0.499934 0.49956 0.478609 0.843767 0.499018 0.557202 0 0.999905 0 0 0.798043 0.781168 0.999825 0.999566 Naive Bayes 90% 0.951444 0.996449 0.990112 0.999705 0.963252 0.995676 0.993398 0.99945 0.999271 0.999615 0.999225 0.999506 0.90296 0.992751 0.979854 0.999994 Radial SVM 90% 0.980485 0.993281 0.989432 0.989538 0.998117 0.999292 0.999615 0.999196 0.999807 0.999992 0.999865 0.999946 0.96069 0.98659 0.980453 0.979893 Random Forest 90% 0.997504 0.999792 0.999721 0.999862 0.999186 0.999019 0.999176 0.999861 0.99966 0.999611 0.999072 0.999653 0.994261 0.999523 0.999101 0.999944 Recurrent Neural Network 90% 0.953197 0.996415 0.994984 0.989684 0.995458 0.999105 0.994359 0.999099 0.940241 0.999743 0.999413 0.999044 0.966197 0.992958 0.989547 0.980106 XGBoost 90% 0.951005 0.880445 0.994803 0.999332 0.987854 0.990933 0.99962 0.999129 0.940445 0.768456 0.999748 0.999756 0.961507 0.993141 0.990123 0.999054 Artificial Neural Network 75% 0.499672 0.499234 0.499465 0.49992 0.86688 0.499635 0.499038 0.711455 0 0 0 0 0.999901 0.999504 0.999824 0.999311 Decision Tree 75% 0.95192 0.957945 0.975856 0.970311 0.951442 0.958068 0.975539 0.970644 0.999311 0.999076 0.999971 0.999068 0.903644 0.916274 0.951316 0.941325 KNN 75% 0.882368 0.910209 0.855295 0.970654 0.974197 0.990736 0.981778 0.9947 0.822765 0.84519 0.749392 0.999572 0.941467 0.973667 0.961382 0.941508 Linear SVM 75% 0.992764 0.996092 0.999739 0.999137 0.997882 0.999437 0.999847 0.999295 0.999558 0.999757 0.999983 0.999221 0.985274 0.992882 0.999217 0.999179 MultiLayer Perceptron 75% 0.399006 0.890886 0.499158 0.499188 0.478941 0.843166 0.499688 0.557616 0 0.999716 0 0 0.797865 0.78151 0.999652 0.999915 Naive Bayes 75% 0.95097 0.996565 0.990337 0.999761 0.963623 0.996258 0.99258 0.999674 0.999845 0.999726 0.999969 0.999788 0.903533 0.992821 0.980689 0.999522 Radial SVM 75% 0.980345 0.992682 0.989436 0.990199 0.998862 0.999575 0.999867 0.999635 0.999713 0.999591 0.999187 0.999089 0.960841 0.986929 0.980735 0.980681 Random Forest 75% 0.996678 0.999895 0.999578 0.999311 0.999509 0.999653 0.999014 0.99965 0.999568 0.999748 0.999573 0.999904 0.994331 0.99957 0.999895 0.999111 Recurrent Neural Network 75% 0.95309 0.996603 0.994528 0.989927 0.995979 0.999589 0.994779 0.999038 0.940587 0.999892 0.999211 0.999114 0.965524 0.993574 0.989422 0.980466 XGBoost 75% 0.950366 0.880771 0.99482 0.999108 0.987991 0.991055 0.999212 0.9995 0.940717 0.768237 0.999648 0.999651 0.96079 0.992623 0.989436 0.999159 Artificial Neural Network 60% 0.499728 0.49981 0.499339 0.499041 0.867457 0.499183 0.499244 0.710734 0 0 0 0 0.999843 0.999938 0.999745 0.999423 Decision Tree 60% 0.951615 0.958277 0.975788 0.971056 0.95096 0.958092 0.975313 0.970256 0.999553 0.999695 0.999061 0.999962 0.903178 0.916077 0.951701 0.942231 KNN 60% 0.882405 0.90944 0.855688 0.971033 0.974953 0.991113 0.981402 0.99438 0.823158 0.845327 0.749266 0.999553 0.942072 0.973476 0.961487 0.942277 Linear SVM 60% 0.992142 0.996241 0.999789 0.999288 0.997918 0.99972 0.999509 0.99996 0.99987 0.999042 0.999094 0.99975 0.985577 0.993319 0.999227 0.99901 MultiLayer Perceptron 60% 0.398516 0.890196 0.499188 0.499624 0.478938 0.843483 0.499503 0.557022 0 0.999525 0 0 0.797977 0.781973 0.999193 0.999001 Naive Bayes 60% 0.951575 0.996375 0.990055 0.999535 0.96371 0.995438 0.993117 0.999088 0.999784 0.999959 0.999444 0.999962 0.903197 0.99299 0.980739 0.999406 Radial SVM 60% 0.98043 0.992791 0.990028 0.990374 0.998014 0.999975 0.999468 0.999288 0.999032 0.999954 0.999579 0.999308 0.960804 0.986815 0.980574 0.97988 Random Forest 60% 0.99688 0.999927 0.999989 0.999794 0.999574 0.999255 0.999197 0.999556 0.999997 0.999525 0.999031 0.999443 0.995066 0.999116 0.999781 0.999761 Recurrent Neural Network 60% 0.952976 0.995881 0.994728 0.989421 0.995209 0.999512 0.994509 0.999736 0.940404 0.999228 0.999256 0.999704 0.966022 0.993536 0.989938 0.97986 XGBoost 60% 0.950417 0.881373 0.994418 0.999528 0.988784 0.990816 0.999976 0.999416 0.940944 0.768353 0.999114 0.999924 0.960762 0.992967 0.989853 0.999835 Artificial Neural Network 50% 0.499034 0.499317 0.499979 0.499788 0.866867 0.499736 0.499675 0.711249 0 0 0 0 0.999349 0.999549 0.999466 0.99937 Decision Tree 50% 0.950924 0.957681 0.975582 0.970455 0.951305 0.957708 0.975346 0.970739 0.999318 0.999994 0.999572 0.999448 0.903745 0.916384 0.951242 0.941889 KNN 50% 0.882033 0.90997 0.855669 0.971128 0.974842 0.991108 0.981337 0.994651 0.822716 0.845178 0.7499 0.999556 0.941774 0.973831 0.961255 0.941671 Linear SVM 50% 0.991803 0.996098 0.999999 0.99978 0.998329 0.999389 0.999703 0.999706 0.999772 0.999116 0.999722 0.999564 0.984781 0.993452 0.999967 0.999834 MultiLayer Perceptron 50% 0.398476 0.890581 0.499742 0.499621 0.479073 0.843733 0.499061 0.556873 0 0.99977 0 0 0.797533 0.781149 0.999572 0.99975 Naive Bayes 50% 0.951668 0.996171 0.989436 0.999248 0.962957 0.996262 0.992656 0.999383 0.999753 0.999496 0.999549 0.999786 0.903714 0.993547 0.980315 0.999619 Radial SVM 50% 0.980061 0.993191 0.989835 0.989947 0.998154 0.999246 0.999192 0.999133 0.999115 0.999538 0.999429 0.999748 0.961364 0.986243 0.980532 0.980545 Random Forest 50% 0.996973 0.999437 0.999612 0.999658 0.999785 0.999131 0.999608 0.999935 0.999878 0.999437 0.999166 0.999205 0.994769 0.999403 0.999357 0.999163 Recurrent Neural Network 50% 0.953158 0.996492 0.994379 0.990063 0.995161 0.999991 0.995102 0.999299 0.940777 0.999553 0.999056 0.999067 0.96605 0.993518 0.989417 0.980716 XGBoost 50% 0.950785 0.881305 0.994228 0.999784 0.9882 0.990875 0.999933 0.999271 0.94051 0.768963 0.999772 0.999111 0.961419 0.993131 0.989461 0.99944

5.2.2. Chi-square Feature Selection

Table 5.6 shows the results of the models for the Chi-square feature selection method by selecting different percentages of features and training the models on 90%, 80%, 70% and 60% data. The following observations can be made from Table 5.6.

1. The highest AUC and Accuracy values are obtained by training the model with 70%, 80% or 90% training data in most of the cases.

2. The highest Accuracy for Chi-Squared feature selection method is obtained with Linear SVM, RNN, XGBoost and Random Forest for 100% of the features selected from all data with a training to testing ratio of 80:20 and 90:10. These classifiers achieve 100% accuracy. It can be observed that high performance is achieved by selecting 90%, 75%, 60% and 50% features.

3. For the AUC, Linear and Radial SVM, Random Forest, RNN and XGBoost provide AUC of 1.00.

Table 5.6 Model Comparison based on Chi Square Feature Selection method

Accuracy Accuracy Accuracy Accuracy AUC AUC AUC AUC Senstivity Senstivity Senstivity Senstivity Specificity Specificity Specificity Specificity

split 60-40 70-30 80-20 90-10 60-40 70-30 80-20 90-10 60-40 70-30 80-20 90-10 60-40 70-30 80-20 90-10

Model Features

Artificial Neural Network 100% 0.5 0.87179 0.5 0.5 0.5209 0.9307 0.5913 0.5673 0 0.76923 0 0 1 0.97436 1 1

Decision Tree 100% 0.95192 0.90064 0.97596 0.875 0.9519 0.9006 0.976 0.9928 1 0.92308 1 0.75 0.90385 0.87821 0.95192 1

KNN 100% 0.93213 0.98718 0.98558 0.97115 0.9693 0.997 0.994 0.9904 0.94118 1 1 1 0.92308 0.97436 0.97115 0.94231

Linear SVM 100% 0.95857 0.99679 1 1 0.9395 1 1 1 0.94118 1 1 1 0.97596 0.99359 1 1

MultiLayer Perceptron 100% 0.67704 0.60577 0.84135 0.65385 0.8194 0.6058 0.9105 0.8462 0.41176 0.92308 0.875 0.5 0.94231 0.28846 0.80769 0.80769 Naive Bayes 100% 0.93269 0.99038 0.91346 0.99038 0.9471 0.9926 0.9261 0.9904 1 1 0.875 1 0.86538 0.98077 0.95192 0.98077

Radial SVM 100% 0.97356 0.99359 0.99038 0.99038 0.9963 0.9995 1 1 1 1 1 1 0.94712 0.98718 0.98077 0.98077

Random Forest 100% 1 1 0.99519 1 1 1 1 1 1 1 1 1 1 1 0.99038 1

Recurrent Neural Network 100% 0.98077 0.99038 0.97596 1 0.9808 0.9946 1 1 1 1 1 1 0.96154 0.98077 0.95192 1

XGBoost 100% 0.99519 0.95833 1 1 1 0.9941 1 1 1 0.92308 1 1 0.99038 0.99359 1 1

Artificial Neural Network 90% 0.499928 0.871277 0.499848 0.499759 0.520523 0.9301 0.590425 0.567145 0 0.768635 0 0 0.999332 0.974042 0.999848 0.999225 Decision Tree 90% 0.951217 0.899904 0.975608 0.874594 0.951568 0.900044 0.975679 0.991971 0.999423 0.922677 0.999629 0.749034 0.902977 0.877874 0.951416 0.999106 KNN 90% 0.931405 0.986817 0.985444 0.970724 0.968565 0.996982 0.993504 0.989635 0.94043 0.999077 0.999061 0.9994 0.923011 0.974198 0.970537 0.941347 Linear SVM 90% 0.958451 0.996461 0.999565 0.999718 0.938923 0.999333 0.999599 0.999761 0.9405 0.999695 0.999344 0.999938 0.975524 0.993068 0.999411 0.999539 MultiLayer Perceptron 90% 0.676891 0.604923 0.840539 0.653225 0.818739 0.605422 0.910368 0.845757 0.411319 0.922875 0.874995 0.499522 0.94165 0.287993 0.806818 0.806765 Naive Bayes 90% 0.932543 0.990207 0.913435 0.989858 0.946832 0.992209 0.926048 0.990079 0.999885 0.999831 0.874122 0.999437 0.865128 0.979877 0.951198 0.980076 Radial SVM 90% 0.973542 0.993174 0.990366 0.989565 0.995453 0.998651 0.999306 0.999804 0.999633 0.999479 0.999458 0.999882 0.946776 0.987124 0.980095 0.979994 Random Forest 90% 0.999314 0.999548 0.99489 0.999022 0.999201 0.999747 0.999018 0.999646 0.999366 0.999654 0.999148 0.99959 0.99935 0.999548 0.990371 0.999699 Recurrent Neural Network 90% 0.980527 0.989387 0.975382 0.999062 0.980467 0.99363 0.999684 0.999369 0.99914 0.999328 0.999955 0.999666 0.960952 0.980543 0.950941 0.999783 XGBoost 90% 0.994835 0.958075 0.999409 0.999096 0.999893 0.99408 0.999131 0.999294 0.999642 0.922223 0.999996 0.999523 0.989767 0.992923 0.999612 0.999789 Artificial Neural Network 75% 0.499887 0.871524 0.499025 0.499073 0.520229 0.930691 0.590379 0.567288 0 0.769077 0 0 0.999244 0.973862 0.999127 0.999176 Decision Tree 75% 0.95101 0.900217 0.975135 0.874605 0.951627 0.900178 0.975394 0.992554 0.999873 0.922166 0.99917 0.749097 0.903787 0.878194 0.951263 0.999727 KNN 75% 0.931363 0.98619 0.98474 0.971109 0.969167 0.996583 0.993763 0.989687 0.940586 0.999851 0.999535 0.999951 0.922272 0.974127 0.970456 0.941779 Linear SVM 75% 0.958464 0.996555 0.999877 0.999006 0.938795 0.999382 0.999455 0.999168 0.940306 0.999697 0.99918 0.999947 0.974965 0.99287 0.999695 0.999228 MultiLayer Perceptron 75% 0.67628 0.60488 0.841047 0.652927 0.818605 0.60571 0.909958 0.846092 0.411009 0.922468 0.874364 0.499413 0.942131 0.287663 0.806879 0.807172 Naive Bayes 75% 0.932548 0.990196 0.913427 0.989562 0.946744 0.992045 0.925978 0.989837 0.999447 0.999212 0.874484 0.999451 0.864897 0.979902 0.951263 0.980542 Radial SVM 75% 0.972793 0.993075 0.990269 0.990284 0.995321 0.998717 0.999453 0.999222 0.999542 0.999624 0.999535 0.999318 0.946806 0.98642 0.979816 0.980564 Random Forest 75% 0.999077 0.999219 0.994288 0.999284 0.999356 0.999028 0.999386 0.999833 0.999617 0.999699 0.999082 0.999555 0.999422 0.999905 0.98983 0.999228 Recurrent Neural Network 75% 0.980479 0.989877 0.975157 0.999697 0.980494 0.993648 0.999544 0.999588 0.999038 0.999286 0.999742 0.999752 0.961101 0.980291 0.951626 0.999117 XGBoost 75% 0.994742 0.957851 0.999269 0.999927 0.999435 0.994078 0.999109 0.999564 0.999268 0.923039 0.999406 0.999059 0.99002 0.992801 0.999979 0.99957 Artificial Neural Network 60% 0.499673 0.871534 0.499979 0.499263 0.520005 0.930194 0.591031 0.566503 0 0.768653 0 0 0.999675 0.97383 0.999783 0.999595 Decision Tree 60% 0.950931 0.900196 0.9753 0.874582 0.951355 0.900516 0.975817 0.992652 0.999753 0.922349 0.999018 0.749347 0.90351 0.878082 0.951238 0.999772 KNN 60% 0.93186 0.986288 0.985521 0.970541 0.968302 0.996913 0.993283 0.990009 0.940519 0.999658 0.999627 0.999175 0.922361 0.973696 0.971133 0.942113 Linear SVM 60% 0.957657 0.996312 0.99942 0.999348 0.939097 0.9996 0.999845 0.999335 0.940909 0.999499 0.999384 0.99977 0.975815 0.99311 0.999815 0.999984 MultiLayer Perceptron 60% 0.676378 0.605556 0.840575 0.653617 0.818573 0.605218 0.909656 0.846157 0.410784 0.922759 0.874001 0.49957 0.942257 0.287735 0.806965 0.807606 Naive Bayes 60% 0.932047 0.989898 0.913427 0.990031 0.946845 0.992118 0.925854 0.989425 0.999497 0.999165 0.874945 0.999624 0.865252 0.980481 0.951504 0.979917 Radial SVM 60% 0.973286 0.993566 0.98981 0.989783 0.995946 0.999463 0.999688 0.999379 0.999534 0.999726 0.99997 0.999559 0.947074 0.986594 0.980308 0.97998 Random Forest 60% 0.999189 0.999605 0.994917 0.999064 0.999252 0.999035 0.999609 0.999411 0.999739 0.999527 0.999709 0.999842 0.999639 0.999385 0.989837 0.999357 Recurrent Neural Network 60% 0.980517 0.990215 0.975439 0.999482 0.980124 0.994572 0.99903 0.999132 0.999582 0.99959 0.999425 0.999806 0.961441 0.980602 0.95135 0.999327 XGBoost 60% 0.99419 0.957648 0.999328 0.999849 0.999318 0.993905 0.99956 0.999468 0.999256 0.92262 0.999498 0.999567 0.989793 0.993099 0.999098 0.999992 Artificial Neural Network 50% 0.499224 0.870892 0.499351 0.4995 0.520238 0.93042 0.590753 0.566697 0 0.768371 0 0 0.999063 0.974206 0.99976 0.99976 Decision Tree 50% 0.951416 0.900356 0.975818 0.874399 0.950904 0.900322 0.975421 0.991994 0.999073 0.922496 0.999352 0.749715 0.90323 0.877571 0.95149 0.999895 KNN 50% 0.931877 0.986619 0.985018 0.97064 0.969277 0.996812 0.993645 0.989999 0.941066 0.999199 0.999264 0.99903 0.922981 0.973522 0.970725 0.941817 Linear SVM 50% 0.958183 0.996098 0.999579 0.999913 0.938597 0.999856 0.999061 0.999372 0.940277 0.999919 0.999859 0.999419 0.975636 0.993488 0.999153 0.999639 MultiLayer Perceptron 50% 0.676638 0.605646 0.84081 0.653715 0.818605 0.605218 0.909927 0.845946 0.411103 0.922273 0.874498 0.499741 0.941699 0.287722 0.807643 0.807417 Naive Bayes 50% 0.932606 0.990349 0.91277 0.989964 0.946177 0.991859 0.925554 0.989639 0.99907 0.999528 0.874471 0.999747 0.865033 0.980387 0.951138 0.98041 Radial SVM 50% 0.973353 0.993176 0.989846 0.989591 0.995528 0.998902 0.999615 0.99934 0.999515 0.99942 0.999853 0.999748 0.947014 0.986803 0.980565 0.979867 Random Forest 50% 0.999053 0.999632 0.995141 0.99922 0.999474 0.999435 0.999008 0.999099 0.999634 0.999939 0.999285 0.999123 0.999696 0.999916 0.990321 0.999715 Recurrent Neural Network 50% 0.980058 0.989495 0.97527 0.999987 0.980013 0.994288 0.999499 0.999638 0.999525 0.999112 0.999698 0.999367 0.961143 0.980351 0.951688 0.999065 XGBoost 50% 0.994602 0.957643 0.999659 0.999952 0.999478 0.993919 0.999782 0.999046 0.999554 0.922803 0.999725 0.999656 0.989681 0.992679 0.999975 0.999619

5.2.3. Mutual Information (MI) based Feature Selection

Table 5.7 shows the results of the models for the Mutual Information based feature selection method by selecting different percentages of features and training the models on 90%, 80%, 70%

and 60% data. The following observations can be made from Table 5.7.

1. The highest AUC and Accuracy values are obtained by training the model with 80% or 90%

training data in most of the cases.

2. The highest Accuracy of 100% is achieved for Mutual Information based feature selection method with Decision Tree, Random Forest, Linear SVM, Naïve Bayes and XGBoost for 100% of the features selected from all data.

3. The above algorithms give an AUC of 1.00 and additionally, KNN and RNN also perform well.

Table 5.7 Model Comparison based on Mutual Information based Feature Selection method

Accuracy Accuracy Accuracy Accuracy AUC AUC AUC AUC Senstivity Senstivity Senstivity Senstivity Specificity Specificity Specificity Specificity

split 60-40 70-30 80-20 90-10 60-40 70-30 80-20 90-10 60-40 70-30 80-20 90-10 60-40 70-30 80-20 90-10

Model Features

Artificial Neural Network 100% 0.5 0.5 0.5625 0.60577 0.4928 0.8171 0.8828 0.649 0 0 0.125 0.5 1 1 1 0.71154

Decision Tree 100% 0.95192 0.91987 1 0.97115 0.9519 0.9199 1 0.9712 1 0.84615 1 1 0.90385 0.99359 1 0.94231

KNN 100% 0.92732 0.97756 0.95673 0.95192 0.9832 0.9968 0.9952 1 0.94118 1 1 1 0.91346 0.95513 0.91346 0.90385

Linear SVM 100% 0.9976 0.99679 1 1 0.9997 1 1 1 1 1 1 1 0.99519 0.99359 1 1

MultiLayer Perceptron 100% 0.5 0.82692 0.66827 0.55769 0.5 0.8656 0.7602 0.5577 0 0.69231 0.5 0.75 1 0.96154 0.83654 0.36538

Naive Bayes 100% 0.99519 1 1 1 0.9976 1 1 1 1 1 1 1 0.99038 1 1 1

Radial SVM 100% 0.93637 0.92308 0.87019 0.875 0.998 0.9985 0.994 1 0.88235 0.84615 0.75 0.75 0.99038 1 0.99038 1

Random Forest 100% 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Recurrent Neural Network 100% 0.96097 0.98718 0.99038 0.99038 0.9989 1 1 1 0.94118 1 1 1 0.98077 0.97436 0.98077 0.98077

XGBoost 100% 0.96097 0.87821 1 1 0.9969 0.9882 1 1 0.94118 0.76923 1 1 0.98077 0.98718 1 1

Artificial Neural Network 90% 0.499347 0.499088 0.561944 0.605255 0.492167 0.816678 0.882722 0.648684 0 0 0.124519 0.499668 0.999717 0.99976 0.999427 0.711136 Decision Tree 90% 0.951652 0.919264 0.999715 0.970618 0.951528 0.919759 0.999238 0.970226 0.999332 0.846008 0.999597 0.999368 0.903444 0.992954 0.999402 0.941684 KNN 90% 0.927205 0.977475 0.955882 0.951734 0.983193 0.996485 0.994937 0.999645 0.941093 0.999262 0.999694 0.999449 0.913001 0.954912 0.912822 0.903417 Linear SVM 90% 0.996817 0.995888 0.999429 0.999849 0.99873 0.999039 0.999984 0.999623 0.999585 0.999682 0.999803 0.999854 0.994408 0.993023 0.999062 0.999804 MultiLayer Perceptron 90% 0.499184 0.826846 0.66769 0.556729 0.499664 0.865159 0.759845 0.556855 0 0.691348 0.49931 0.749204 0.999162 0.960664 0.836438 0.364975 Naive Bayes 90% 0.994228 0.999403 0.999842 0.999996 0.997573 0.999183 0.999341 0.999649 0.999797 0.99932 0.999026 0.9991 0.990369 0.999905 0.999677 0.999542 Radial SVM 90% 0.935971 0.922766 0.869192 0.874902 0.997385 0.998044 0.993792 0.999078 0.881899 0.845546 0.7491 0.749325 0.990147 0.999311 0.989801 0.999354 Random Forest 90% 0.999242 0.999527 0.999616 0.999299 0.999448 0.999297 0.999082 0.999529 0.999704 0.999799 0.999763 0.999524 0.999551 0.999058 0.999422 0.999393 Recurrent Neural Network 90% 0.960406 0.98706 0.989919 0.990209 0.997964 0.999054 0.999266 0.999286 0.940223 0.99949 0.999447 0.999972 0.980265 0.973452 0.980587 0.980134 XGBoost 90% 0.96023 0.877456 0.999946 0.999305 0.996243 0.98795 0.999254 0.999651 0.94054 0.769059 0.999903 0.999414 0.98058 0.986337 0.999555 0.999865 Artificial Neural Network 75% 0.49992 0.499748 0.561502 0.60504 0.492227 0.816729 0.882402 0.648153 0 0 0.12468 0.499091 0.99934 0.999493 0.999383 0.711045 Decision Tree 75% 0.951699 0.919464 0.999024 0.970776 0.950933 0.919092 0.999086 0.970588 0.99929 0.845743 0.999821 0.999928 0.903154 0.992907 0.999909 0.941469 KNN 75% 0.926711 0.976978 0.956191 0.95127 0.982763 0.996103 0.995194 0.999989 0.940951 0.999841 0.999824 0.999171 0.912829 0.954815 0.912671 0.903493 Linear SVM 75% 0.99691 0.996081 0.999398 0.999267 0.998704 0.9994 0.99919 0.999841 0.999631 0.999344 0.999279 0.999136 0.994346 0.993433 0.999311 0.999013 MultiLayer Perceptron 75% 0.499778 0.826499 0.667399 0.557575 0.499482 0.86524 0.759841 0.557296 0 0.692292 0.499447 0.749593 0.999704 0.960651 0.835927 0.365347 Naive Bayes 75% 0.99459 0.999422 0.999944 0.999742 0.996907 0.999572 0.999145 0.999243 0.999582 0.999154 0.999557 0.999604 0.990353 0.999075 0.999117 0.999785 Radial SVM 75% 0.935955 0.922477 0.869295 0.874411 0.997285 0.998273 0.9936 0.999697 0.88218 0.845961 0.749281 0.7497 0.989857 0.999746 0.989782 0.999887 Random Forest 75% 0.999608 0.999168 0.999751 0.999383 0.999986 0.999615 0.999664 0.999169 0.999217 0.999053 0.99916 0.999775 0.999966 0.999433 0.999873 0.99994 Recurrent Neural Network 75% 0.960073 0.987019 0.989464 0.989727 0.998522 0.999245 0.999086 0.999454 0.940357 0.999812 0.999969 0.999968 0.980622 0.973622 0.979915 0.980543 XGBoost 75% 0.960935 0.877554 0.999031 0.999051 0.996124 0.988094 0.999166 0.999247 0.940935 0.768792 0.999774 0.999062 0.980424 0.986704 0.99945 0.999651 Artificial Neural Network 60% 0.499738 0.499528 0.561514 0.605458 0.492693 0.817015 0.88223 0.648715 0 0 0.124529 0.499133 0.999734 0.999034 0.999884 0.71088 Decision Tree 60% 0.951917 0.919701 0.999575 0.970338 0.95188 0.918954 0.999207 0.970995 0.999908 0.845603 0.999772 0.999599 0.903502 0.99344 0.999557 0.941514 KNN 60% 0.92667 0.977202 0.956635 0.951859 0.983167 0.996425 0.994474 0.999153 0.941123 0.999125 0.999153 0.999082 0.912675 0.954722 0.91316 0.903666 Linear SVM 60% 0.997254 0.996284 0.999391 0.999601 0.99963 0.999341 0.99918 0.999641 0.999197 0.999916 0.999803 0.999131 0.994957 0.992964 0.999672 0.999755 MultiLayer Perceptron 60% 0.49902 0.82601 0.668018 0.55722 0.499101 0.86522 0.759776 0.557219 0 0.692038 0.499013 0.74921 0.999742 0.960574 0.835595 0.364438 Naive Bayes 60% 0.995087 0.999353 0.999504 0.999401 0.997318 0.999106 0.999603 0.999552 0.999683 0.999748 0.999561 0.999634 0.98997 0.99903 0.99915 0.999987 Radial SVM 60% 0.936119 0.922196 0.869613 0.874163 0.997355 0.99799 0.993172 0.999886 0.882324 0.845481 0.749088 0.749752 0.98946 0.999845 0.990106 0.999224 Random Forest 60% 0.999306 0.999503 0.999868 0.999588 0.999879 0.999048 0.99975 0.999932 0.999494 0.99964 0.999445 0.999092 0.999557 0.999045 0.999096 0.999019 Recurrent Neural Network 60% 0.960726 0.986596 0.989738 0.989578 0.998663 0.999779 0.999798 0.999925 0.940666 0.999892 0.999724 0.999008 0.979921 0.974057 0.979977 0.980176 XGBoost 60% 0.960207 0.877681 0.999307 0.999005 0.996523 0.988095 0.999069 0.999597 0.940275 0.769165 0.999778 0.999398 0.980378 0.986552 0.999289 0.999448 Artificial Neural Network 50% 0.499578 0.49951 0.562356 0.605272 0.492614 0.816821 0.8828 0.648582 0 0 0.124079 0.499787 0.999595 0.999699 0.999361 0.71121 Decision Tree 50% 0.95172 0.919105 0.999183 0.970856 0.951555 0.919409 0.999835 0.970486 0.999496 0.846 0.999541 0.999932 0.902886 0.992994 0.999217 0.94154 KNN 50% 0.926986 0.977471 0.956087 0.951686 0.982426 0.995839 0.994303 0.999378 0.94109 0.999875 0.999064 0.999223 0.913227 0.95473 0.91299 0.903431 Linear SVM 50% 0.997406 0.996211 0.999884 0.999964 0.998897 0.999426 0.999797 0.999443 0.999217 0.99979 0.999261 0.999515 0.994285 0.992645 0.999744 0.999345 MultiLayer Perceptron 50% 0.499492 0.826739 0.667638 0.557581 0.499194 0.864754 0.7599 0.55732 0 0.692055 0.499348 0.749331 0.999708 0.960675 0.83578 0.364751 Naive Bayes 50% 0.995112 0.999979 0.999713 0.999352 0.99746 0.999439 0.999364 0.999033 0.99981 0.999285 0.999812 0.999765 0.989957 0.999328 0.999327 0.999469 Radial SVM 50% 0.9362 0.922559 0.869607 0.874732 0.997164 0.997883 0.993327 0.999611 0.881574 0.845548 0.749118 0.749474 0.989633 0.999073 0.990218 0.999327 Random Forest 50% 0.999717 1 0.999162 0.999098 0.999171 0.999929 0.99916 0.999321 0.999355 0.999655 0.999229 0.999093 0.999299 0.999238 0.999579 0.999851 Recurrent Neural Network 50% 0.960939 0.986572 0.989511 0.989908 0.998123 0.999786 0.999088 0.999071 0.940285 0.999578 0.999322 0.999559 0.980306 0.974124 0.980447 0.979945 XGBoost 50% 0.96094 0.87797 0.999531 0.999094 0.996697 0.988072 0.999782 0.999789 0.940469 0.768912 0.999629 0.999573 0.980715 0.986862 0.999049 0.999956

5.2.4. PCA Feature Selection

Table 5.8 shows the results of the models for the PCA feature selection method by selecting different percentages of features and training the models on 90%, 80%, 70% and 60% data. The following observations can be made from Table 5.8.

1. The highest AUC and Accuracy values are obtained by training the model with 80% and 90%

training data in most of the cases.

2. The highest Accuracy of 0.98558 was obtained for PCA feature selection method with XGBoost for 100% of the features selected from all data with a training to testing ratio 80:20. XGBoost performs equally well on 90%, 75%, 60% and 50% features. KNN performs well on 75%, 60% and 50% features.

3. KNN, Linear SVM, Random Forest, and XGBoost give highest AUC of 1.00 with 90% training data except for KNN which gives the highest AUC with 80% training data.

Table 5.8 Model Comparison based on PCA Feature Selection method

Accuracy Accuracy Accuracy Accuracy AUC AUC AUC AUC Senstivity Senstivity Senstivity Senstivity Specificity Specificity Specificity Specificity

split 60-40 70-30 80-20 90-10 60-40 70-30 80-20 90-10 60-40 70-30 80-20 90-10 60-40 70-30 80-20 90-10

Model Features

Artificial Neural Network 100% 0.5 0.5 0.5 0.5 0.705 0.766 0.6959 0.7212 0 0 0 0 1 1 1 1

Decision Tree 100% 0.95673 0.8141 0.95673 0.72115 0.9538 0.8141 0.9597 0.7212 1 0.69231 1 0.5 0.91346 0.9359 0.91346 0.94231

KNN 100% 0.95433 0.98077 0.96635 0.98077 0.9992 0.9936 1 1 1 1 1 1 0.90865 0.96154 0.93269 0.96154

Linear SVM 100% 0.91176 0.91987 0.875 0.875 0.9536 0.9694 0.9243 1 0.82353 0.84615 0.75 0.75 1 0.99359 1 1

MultiLayer Perceptron 100% 0.92251 0.95833 0.95673 0.94231 0.9901 0.9877 0.9856 0.9904 0.94118 1 1 1 0.90385 0.91667 0.91346 0.88462 Naive Bayes 100% 0.80727 0.82372 0.76923 0.73077 0.8894 0.8679 0.8029 0.7668 0.70588 0.76923 0.625 0.5 0.90865 0.87821 0.91346 0.96154

Radial SVM 100% 0.90455 0.92308 0.875 0.875 0.9126 0.8915 0.8666 0.9615 0.82353 0.84615 0.75 0.75 0.98558 1 1 1

Random Forest 100% 0.95617 0.97436 0.96635 0.96154 0.9983 0.9906 0.9976 1 0.94118 1 1 1 0.97115 0.94872 0.93269 0.92308

Recurrent Neural Network 100% 0.5 0.5 0.5 0.5 0.5201 0.5673 0.6154 0.726 0 0 0 0 1 1 1 1

XGBoost 100% 0.95913 0.97436 0.98558 0.98077 0.9977 0.997 0.9964 1 1 1 1 1 0.91827 0.94872 0.97115 0.96154

Artificial Neural Network 90% 0.499946 0.499901 0.499402 0.499181 0.704677 0.765457 0.695406 0.720812 0 0 0 0 0.999758 0.999941 0.999179 0.999411 Decision Tree 90% 0.955746 0.813341 0.956162 0.72059 0.953087 0.813244 0.959022 0.721078 0.999442 0.691354 0.999146 0.499346 0.91301 0.935263 0.913139 0.941817 KNN 90% 0.953921 0.979834 0.965999 0.979978 0.998244 0.993397 0.99969 0.999672 0.99967 0.999477 0.999801 0.999406 0.907901 0.961509 0.932597 0.960753 Linear SVM 90% 0.911209 0.919228 0.874023 0.874848 0.952826 0.968402 0.923822 0.999567 0.822985 0.845817 0.749363 0.749506 0.999074 0.99341 0.999884 0.999376 MultiLayer Perceptron 90% 0.921979 0.957543 0.956482 0.94157 0.990061 0.98705 0.985056 0.989425 0.940367 0.999377 0.99994 0.9993 0.902929 0.915957 0.913374 0.88411 Naive Bayes 90% 0.806422 0.822728 0.768636 0.729954 0.889147 0.867679 0.802455 0.76614 0.705639 0.768472 0.624806 0.499963 0.908534 0.877684 0.913393 0.961491 Radial SVM 90% 0.903861 0.922456 0.874579 0.87444 0.912558 0.891287 0.86628 0.961149 0.822547 0.845858 0.749627 0.749423 0.984742 0.999505 0.999886 0.999297 Random Forest 90% 0.956145 0.973981 0.966164 0.961176 0.997735 0.990435 0.997202 0.999893 0.940294 0.999398 0.999117 0.999462 0.970992 0.948212 0.931756 0.922924 Recurrent Neural Network 90% 0.49973 0.499618 0.499935 0.499899 0.519108 0.566805 0.614656 0.725947 0 0 0 0 0.999238 0.999644 0.999872 0.999406 XGBoost 90% 0.958346 0.974325 0.98513 0.980105 0.996807 0.996354 0.996255 0.999937 0.99969 0.999706 0.999457 0.999947 0.918114 0.948537 0.970954 0.961029 Artificial Neural Network 75% 0.499018 0.499722 0.499809 0.499162 0.70451 0.765772 0.69508 0.72038 0 0 0 0 0.999533 0.999024 0.9995 0.999866 Decision Tree 75% 0.95654 0.81364 0.956695 0.720452 0.953332 0.813896 0.958979 0.720439 0.999462 0.692136 0.999529 0.499996 0.91303 0.93575 0.912789 0.94216 KNN 75% 0.953588 0.980088 0.965968 0.980085 0.998858 0.993206 0.999036 0.999195 0.999995 0.999519 0.999972 0.999189 0.908599 0.960745 0.932489 0.961148 Linear SVM 75% 0.911288 0.919386 0.874628 0.874474 0.953429 0.968976 0.924199 0.999411 0.82287 0.84563 0.749278 0.749351 0.999276 0.992845 0.999292 0.999319 MultiLayer Perceptron 75% 0.922013 0.95751 0.956272 0.94161 0.989287 0.986947 0.985071 0.989941 0.940628 0.999835 0.999255 0.999124 0.903119 0.916257 0.912858 0.8841 Naive Bayes 75% 0.806983 0.823635 0.769127 0.730078 0.889173 0.867561 0.801916 0.766037 0.70518 0.769168 0.624411 0.499987 0.907849 0.877234 0.912642 0.961108 Radial SVM 75% 0.903613 0.922541 0.874502 0.874573 0.911934 0.891408 0.866152 0.960761 0.823227 0.84545 0.749028 0.749133 0.984947 0.999802 0.999454 0.999687 Random Forest 75% 0.955413 0.973761 0.965664 0.961097 0.998284 0.989899 0.996937 0.999328 0.940301 0.999704 0.999782 0.999237 0.970848 0.948402 0.931882 0.922286 Recurrent Neural Network 75% 0.499014 0.499446 0.499091 0.499719 0.519507 0.566996 0.614539 0.725349 0 0 0 0 0.999309 0.999382 0.999555 0.999801 XGBoost 75% 0.958263 0.973616 0.984789 0.979944 0.997591 0.996103 0.996395 0.999056 0.999677 0.999317 0.999134 0.999198 0.91762 0.948114 0.970914 0.960895 Artificial Neural Network 60% 0.499709 0.49931 0.49938 0.499577 0.704574 0.765474 0.695703 0.721092 0 0 0 0 0.999799 0.999602 0.999879 0.999615 Decision Tree 60% 0.956141 0.813138 0.956174 0.720889 0.95291 0.813501 0.959646 0.720664 0.999387 0.691896 0.999121 0.49947 0.912841 0.934982 0.912652 0.941317 KNN 60% 0.95354 0.980276 0.965686 0.980017 0.998225 0.992899 0.999912 0.999109 0.999077 0.999611 0.999663 0.999759 0.907988 0.960653 0.932386 0.960867 Linear SVM 60% 0.911589 0.919048 0.874507 0.874774 0.952738 0.96891 0.92387 0.99924 0.822851 0.846099 0.749577 0.749657 0.999659 0.99278 0.999477 0.999139 MultiLayer Perceptron 60% 0.921564 0.957575 0.955883 0.941503 0.990075 0.987553 0.985356 0.990214 0.940331 0.999846 0.99998 0.999233 0.903173 0.915772 0.912862 0.884364 Naive Bayes 60% 0.806302 0.823385 0.768582 0.729856 0.888637 0.867502 0.801932 0.766429 0.704979 0.769148 0.624168 0.499311 0.908148 0.877825 0.913448 0.961197 Radial SVM 60% 0.903908 0.922565 0.874995 0.874657 0.912553 0.890586 0.86627 0.961353 0.823129 0.845449 0.749306 0.74988 0.984645 0.999611 0.999472 0.999305 Random Forest 60% 0.9558 0.973842 0.96554 0.960936 0.997925 0.990456 0.997302 0.999828 0.941147 0.999295 0.999129 0.999509 0.970933 0.94812 0.93219 0.922673 Recurrent Neural Network 60% 0.499465 0.49958 0.499976 0.499725 0.519135 0.566447 0.614632 0.72547 0 0 0 0 0.99986 0.999823 0.999708 0.999869 XGBoost 60% 0.958843 0.974201 0.985132 0.980088 0.996988 0.996501 0.995954 0.999012 0.99976 0.999524 0.999067 0.999383 0.918145 0.948294 0.970816 0.960626 Artificial Neural Network 50% 0.499231 0.49983 0.499556 0.499605 0.70467 0.765972 0.695464 0.720438 0 0 0 0 0.999036 0.999854 0.99931 0.999053 Decision Tree 50% 0.956292 0.814045 0.956127 0.720295 0.95352 0.813867 0.959365 0.720659 0.999386 0.691841 0.999331 0.499118 0.912526 0.935724 0.913031 0.942262 KNN 50% 0.953776 0.980403 0.966068 0.980255 0.998241 0.993351 0.999092 0.999157 0.999611 0.999324 0.999178 0.99903 0.90864 0.961355 0.932213 0.961351 Linear SVM 50% 0.91091 0.919305 0.874141 0.874216 0.952726 0.96878 0.923536 0.999786 0.823491 0.845688 0.749481 0.749987 0.999884 0.993392 0.999929 0.999674 MultiLayer Perceptron 50% 0.922487 0.957507 0.95654 0.941925 0.989942 0.987421 0.98475 0.989945 0.940372 0.999506 0.999749 0.999823 0.903332 0.916256 0.91326 0.884425 Naive Bayes 50% 0.806562 0.823312 0.768613 0.730465 0.88907 0.867435 0.80285 0.765995 0.705138 0.768343 0.624669 0.499302 0.9085 0.877888 0.913316 0.961505 Radial SVM 50% 0.903629 0.922448 0.874065 0.874839 0.912464 0.891209 0.866135 0.960707 0.822867 0.845444 0.749057 0.749617 0.985335 0.9998 0.999028 0.999859 Random Forest 50% 0.955948 0.974104 0.965836 0.961049 0.998162 0.990106 0.996994 0.999165 0.940408 0.999984 0.999268 0.999897 0.971142 0.948295 0.932066 0.922471 Recurrent Neural Network 50% 0.499201 0.499986 0.4996 0.499044 0.519559 0.567162 0.614503 0.725921 0 0 0 0 0.999888 0.999633 0.999868 0.999262 XGBoost 50% 0.958889 0.974031 0.985109 0.980662 0.99739 0.996043 0.99636 0.999997 0.999546 0.999163 0.99934 0.999433 0.918195 0.94775 0.970724 0.960753

5.2.5. ReliefF Feature Selection

Table 5.9 shows the results of the models for the ReliefF feature selection method by selecting different percentages of features and training the models on 90%, 80%, 70% and 60% data. The following observations can be made from Table 5.9.

1. The highest AUC and Accuracy values are obtained by training the model with 70%, 80% and 90% training data in most of the cases.

2. The highest Accuracy for ReliefF based feature selection method is obtained with SVM (Linear and Radial), Random Forest, RNN and XGBoost for 100% of the features selected from all data with a training to testing ratio of 80:20 and 90:10. XGBoost performs well with 90%, 75%, 60%

and 50% features.

3. An AUC of 1.00 is obtained by the above algorithms.

Table 5.9 Model Comparison based on ReliefF Feature Selection method

Accuracy Accuracy Accuracy Accuracy AUC AUC AUC AUC Senstivity Senstivity Senstivity Senstivity Specificity Specificity Specificity Specificity

split 60-40 70-30 80-20 90-10 60-40 70-30 80-20 90-10 60-40 70-30 80-20 90-10 60-40 70-30 80-20 90-10

Model Features

Artificial Neural Network 100% 0.5 0.5 0.5 0.5 0.5 0.6346 0.4087 0.649 0 0 0 0 1 1 1 1

Decision Tree 100% 0.92067 0.9391 0.97596 0.97115 0.9519 0.947 0.976 0.9712 1 0.92308 1 1 0.84135 0.95513 0.95192 0.94231 KNN 100% 0.93693 0.9359 0.91346 0.95192 0.9733 0.9847 0.9916 0.9856 0.94118 0.92308 0.875 1 0.93269 0.94872 0.95192 0.90385

Linear SVM 100% 0.99038 1 1 1 0.9977 1 1 1 1 1 1 1 0.98077 1 1 1

MultiLayer Perceptron 100% 0.486 0.69872 0.62019 0.5 0.6154 0.6987 0.6148 0.5 0.29412 0.69231 0.875 0 0.67788 0.70513 0.36538 1 Naive Bayes 100% 0.9399 0.97756 0.91346 0.85577 0.9856 0.9847 0.9183 0.863 1 1 0.875 0.75 0.87981 0.95513 0.95192 0.96154

Radial SVM 100% 0.95617 0.99679 0.99519 1 0.9972 0.9995 1 1 0.94118 1 1 1 0.97115 0.99359 0.99038 1

Random Forest 100% 0.99279 1 1 1 1 1 1 1 1 1 1 1 0.98558 1 1 1

Recurrent Neural Network 100% 0.97837 0.99679 0.97596 1 0.988 1 1 1 1 1 1 1 0.95673 0.99359 0.95192 1

XGBoost 100% 0.98077 1 0.99519 1 0.998 1 1 1 1 1 1 1 0.96154 1 0.99038 1

Artificial Neural Network 90% 0.499921 0.499113 0.499789 0.49946 0.499861 0.634486 0.408207 0.648094 0 0 0 0 0.999991 0.999536 0.99917 0.999237 Decision Tree 90% 0.920233 0.938511 0.975164 0.970883 0.951025 0.946117 0.975957 0.970505 0.999087 0.922341 0.999807 0.999579 0.84106 0.954969 0.951631 0.942135 KNN 90% 0.936764 0.935583 0.91321 0.951388 0.972461 0.984218 0.990905 0.985312 0.940569 0.922452 0.874504 0.999281 0.932319 0.948489 0.95143 0.903049 Linear SVM 90% 0.989511 0.999615 0.999318 0.999092 0.996924 0.999273 0.999286 0.999832 0.99936 0.999498 0.999637 0.999347 0.980034 0.999638 0.999109 0.999555 MultiLayer Perceptron 90% 0.485385 0.697727 0.619645 0.499204 0.614836 0.697721 0.614606 0.499114 0.293809 0.692177 0.874122 0 0.677589 0.704774 0.364679 0.999077 Naive Bayes 90% 0.939751 0.976612 0.912906 0.855333 0.985478 0.984125 0.917853 0.862282 0.99947 0.999223 0.874487 0.749714 0.879081 0.954591 0.951608 0.96144 Radial SVM 90% 0.955763 0.996037 0.994896 0.999521 0.99668 0.998986 0.999532 0.999012 0.940784 0.999003 0.999942 0.999722 0.97058 0.993506 0.989874 0.999372 Random Forest 90% 0.991793 0.999335 0.999296 0.99946 0.999291 0.999262 0.999631 0.999466 0.999542 0.999776 0.999937 0.999928 0.98555 0.999631 0.999509 0.999734 Recurrent Neural Network 90% 0.977428 0.996017 0.975034 0.999125 0.987626 0.999346 0.999135 0.999174 0.999758 0.99957 0.999606 0.999371 0.956369 0.993126 0.951831 0.999393 XGBoost 90% 0.979992 0.999699 0.995011 0.999519 0.997286 0.99989 0.999684 0.999458 0.999172 0.999104 0.999855 0.999853 0.961138 0.999497 0.989503 0.999407 Artificial Neural Network 75% 0.499551 0.499439 0.499385 0.499559 0.499465 0.633718 0.4085 0.648379 0 0 0 0 0.999623 0.999556 0.999396 0.999622 Decision Tree 75% 0.919949 0.938371 0.975115 0.971019 0.950987 0.946629 0.975941 0.970884 0.9998 0.922969 0.999645 0.999301 0.840958 0.954231 0.951036 0.942132 KNN 75% 0.936588 0.935752 0.912628 0.951143 0.972717 0.983949 0.991496 0.985176 0.940429 0.922796 0.874299 0.999602 0.93229 0.948069 0.951569 0.903576 Linear SVM 75% 0.989512 0.999618 0.999527 0.999048 0.996703 0.999075 0.999793 0.999956 0.999777 0.999784 0.999403 0.9995 0.980436 0.999453 0.999809 0.999433 MultiLayer Perceptron 75% 0.485219 0.697789 0.619518 0.499425 0.615041 0.698387 0.614382 0.499202 0.293228 0.691314 0.874777 0 0.677451 0.704791 0.364517 0.999884 Naive Bayes 75% 0.939879 0.976789 0.912684 0.854775 0.984943 0.983971 0.918157 0.862273 0.999254 0.999416 0.874724 0.749197 0.879192 0.954294 0.950957 0.960629 Radial SVM 75% 0.95614 0.996679 0.99454 0.999326 0.996814 0.998523 0.999866 0.999568 0.940203 0.999079 0.999487 0.999434 0.971121 0.993554 0.989796 0.999421 Random Forest 75% 0.992504 0.99943 0.999476 0.999869 0.999452 0.999828 0.999396 0.999446 0.99929 0.999115 0.999021 0.99995 0.985146 0.999743 0.999612 0.99963 Recurrent Neural Network 75% 0.977637 0.995878 0.975458 0.999028 0.987011 0.999585 0.999581 0.999335 0.999514 0.99982 0.999285 0.999899 0.956526 0.993092 0.951659 0.999995 XGBoost 75% 0.980618 0.999262 0.995083 0.999247 0.997713 0.999606 0.999831 0.999573 0.999074 0.999329 0.999648 0.999587 0.961284 0.999909 0.989389 0.999071 Artificial Neural Network 60% 0.499755 0.499445 0.499839 0.499423 0.499717 0.633632 0.408023 0.648375 0 0 0 0 0.999279 0.999869 0.999765 0.999803 Decision Tree 60% 0.920572 0.938915 0.975712 0.970823 0.951605 0.946639 0.975531 0.971096 0.99999 0.922364 0.999466 0.999044 0.840792 0.954329 0.951322 0.941759 KNN 60% 0.936455 0.935821 0.912764 0.95158 0.972545 0.984086 0.99155 0.985297 0.940446 0.92256 0.874774 0.999038 0.932341 0.948173 0.951521 0.902891 Linear SVM 60% 0.98987 0.999165 0.999953 0.99981 0.997155 0.999546 0.999646 0.999616 0.999552 0.999307 0.999867 0.999407 0.980258 0.9996 0.99986 0.999835 MultiLayer Perceptron 60% 0.485664 0.698236 0.619247 0.499172 0.614429 0.697886 0.613933 0.499806 0.293238 0.692073 0.87472 0 0.67744 0.704135 0.36496 0.99902 Naive Bayes 60% 0.939164 0.976829 0.912797 0.855154 0.985439 0.984537 0.917519 0.862736 0.99953 0.999429 0.874404 0.749361 0.87963 0.954851 0.950986 0.960685 Radial SVM 60% 0.955642 0.996457 0.994463 0.999258 0.997007 0.999177 0.999736 0.999671 0.940294 0.999886 0.999023 0.99951 0.970499 0.99356 0.990031 0.999246 Random Forest 60% 0.992412 0.999514 0.99991 0.999299 0.999628 0.999553 0.99904 0.999584 0.999218 0.999235 0.999312 0.999762 0.985474 0.999811 0.999837 0.999168 Recurrent Neural Network 60% 0.978032 0.996631 0.975882 0.999701 0.987636 0.999023 0.999079 0.999473 0.99906 0.999952 0.999475 0.999245 0.956337 0.993444 0.951419 0.999289 XGBoost 60% 0.979916 0.999244 0.994264 0.999777 0.997804 0.999921 0.999798 0.999011 0.999285 0.999531 0.999356 0.999428 0.961357 0.999796 0.989826 0.999543 Artificial Neural Network 50% 0.499917 0.499085 0.499508 0.499295 0.499264 0.633797 0.407948 0.648372 0 0 0 0 0.999417 0.999601 0.999576 0.999337 Decision Tree 50% 0.920181 0.939051 0.975574 0.971031 0.951304 0.946093 0.975444 0.970762 0.999477 0.922999 0.999104 0.999231 0.840576 0.95502 0.951557 0.941327 KNN 50% 0.936892 0.935868 0.913391 0.951354 0.973164 0.983842 0.991152 0.985575 0.940604 0.92257 0.874039 0.999444 0.931924 0.948621 0.950955 0.903156 Linear SVM 50% 0.989485 0.999016 0.999696 0.999103 0.997474 0.999511 0.999966 0.999933 0.999981 0.999482 0.999575 0.999205 0.980755 0.999524 0.999512 0.999944 MultiLayer Perceptron 50% 0.485912 0.698309 0.619319 0.499169 0.614925 0.697813 0.613918 0.49964 0.293391 0.691751 0.874177 0 0.677416 0.704479 0.365311 0.999812 Naive Bayes 50% 0.939804 0.976819 0.912742 0.854955 0.98485 0.984026 0.917999 0.86281 0.999635 0.999265 0.874592 0.749599 0.878845 0.955028 0.951757 0.960548 Radial SVM 50% 0.955323 0.99637 0.994633 0.999435 0.996704 0.998609 0.999468 0.999261 0.94115 0.999451 0.999863 0.999358 0.970205 0.993478 0.990351 0.999921 Random Forest 50% 0.99203 0.999622 0.99924 0.999063 0.999404 0.999497 0.999142 0.999718 0.999771 0.999795 0.999716 0.99956 0.984998 0.999553 0.999231 0.999238 Recurrent Neural Network 50% 0.9774 0.996121 0.975057 0.999656 0.987046 0.999122 0.999731 0.999453 0.999139 0.999958 0.999278 0.999374 0.956396 0.993555 0.951199 0.999317 XGBoost 50% 0.980185 0.999776 0.99464 0.999697 0.997198 0.999612 0.999327 0.999042 0.99902 0.999071 0.999004 0.999017 0.961159 0.999431 0.989904 0.999812

5.2.6. RFE Feature Selection

Table 5.10 shows the results of the models for the RFE feature selection method by selecting different percentages of features and training the models on 90%, 80%, 70% and 60% data. The following observations can be made from Table 5.10.

1. The highest AUC and Accuracy values are obtained by training the model with 70%, 80% and 90% training data in most of the cases.

2. The highest Accuracy is obtained for RFE based feature selection method with Decision Tree, KNN, SVM (Linear and Radial), Naive Bayes, Random Forest, RNN and XGBoost for 100% of the features selected from all data with a training to testing ratio of 80:30 80:20, 90:10. Almost all models performed well on 100% data with accuracy 100% and AUC of 1.00 except ANN and MLP. These algorithms perform well with 90%, 75%, 60% and 50% features as well.

Table 5.10 Model Comparison based on RFE Feature Selection method

Accuracy Accuracy Accuracy Accuracy AUC AUC AUC AUC Senstivity Senstivity Senstivity Senstivity Specificity Specificity Specificity Specificity

split 60-40 70-30 80-20 90-10 60-40 70-30 80-20 90-10 60-40 70-30 80-20 90-10 60-40 70-30 80-20 90-10

Model Features

Artificial Neural Network 100% 0.5 0.78205 0.5 0.5 0.5 0.7438 0.5865 0.5 0 0.84615 0 0 1 0.71795 1 1

Decision Tree 100% 0.79765 0.94551 0.98558 1 0.7977 0.9421 0.9856 1 0.70588 0.92308 1 1 0.88942 0.96795 0.97115 1

KNN 100% 0.87033 0.84295 0.74519 1 0.9406 0.9694 0.9754 1 0.76471 0.69231 0.5 1 0.97596 0.99359 0.99038 1

Linear SVM 100% 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

MultiLayer Perceptron 100% 0.81448 0.78526 0.72115 0.875 0.8626 0.8868 0.7716 0.8654 0.70588 0.69231 0.5 0.75 0.92308 0.87821 0.94231 1

Naive Bayes 100% 0.9976 1 1 1 1 1 1 1 1 1 1 1 0.99519 1 1 1

Radial SVM 100% 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Random Forest 100% 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Recurrent Neural Network 100% 1 0.99679 1 1 1 1 1 1 1 1 1 1 1 0.99359 1 1

XGBoost 100% 0.96338 0.85256 1 1 0.9963 0.9818 1 1 0.94118 0.76923 1 1 0.98558 0.9359 1 1

Artificial Neural Network 90% 0.499337 0.781618 0.499592 0.499956 0.499199 0.742868 0.585603 0.49937 0 0.845581 0 0 0.999144 0.717234 0.999314 0.999351 Decision Tree 90% 0.796775 0.945457 0.985368 0.999956 0.797524 0.942034 0.984715 0.999368 0.705247 0.922482 0.999814 0.999098 0.889012 0.967237 0.970195 0.999046 KNN 90% 0.870292 0.842836 0.744258 0.999443 0.939756 0.968809 0.975197 0.99971 0.76443 0.691584 0.499997 0.999687 0.975572 0.993165 0.989558 0.999954 Linear SVM 90% 0.999061 0.999059 0.999132 0.999996 0.999776 0.999524 0.999903 0.999838 0.99984 0.999653 0.999113 0.999523 0.999097 0.999413 0.99924 0.999209 MultiLayer Perceptron 90% 0.813685 0.78482 0.720309 0.874573 0.862391 0.886669 0.770864 0.865228 0.705742 0.692117 0.499732 0.749789 0.923031 0.878082 0.941676 0.999753 Naive Bayes 90% 0.997385 0.999534 0.999121 0.9999 0.999661 0.999228 0.999671 0.999713 0.999673 0.999807 0.999706 0.999751 0.99425 0.999614 0.999339 0.999705 Radial SVM 90% 0.999723 0.99969 0.999434 0.999013 0.999076 0.999964 0.999115 0.999502 0.999862 0.999061 0.999766 0.999419 0.999727 0.99955 0.999336 0.999291 Random Forest 90% 0.999084 0.999809 0.999912 0.999689 0.999317 0.999492 0.999804 0.999322 0.999659 0.999036 0.999888 0.999292 0.99932 0.999745 0.9992 0.999478 Recurrent Neural Network 90% 0.999858 0.996694 0.999831 0.999906 0.999751 0.99912 0.999616 0.999933 0.999864 0.999057 0.999339 0.999264 0.999432 0.99293 0.999345 0.999713 XGBoost 90% 0.962863 0.851826 0.999411 0.99976 0.995702 0.98141 0.999625 0.999144 0.940921 0.769213 0.999297 0.999545 0.984794 0.935023 0.99914 0.999772 Artificial Neural Network 75% 0.499546 0.781655 0.499993 0.499589 0.499076 0.742885 0.585839 0.499063 0 0.845295 0 0 0.99996 0.717562 0.999414 0.999211 Decision Tree 75% 0.797181 0.945352 0.98477 0.999489 0.796715 0.941614 0.985424 0.999801 0.70563 0.922539 0.999576 0.999072 0.888457 0.966992 0.971125 0.999363 KNN 75% 0.869637 0.842493 0.745078 0.99975 0.940533 0.968661 0.97475 0.999524 0.764065 0.691652 0.499084 0.999991 0.975909 0.9935 0.989431 0.999362 Linear SVM 75% 0.999891 0.999728 0.999989 0.999225 0.999175 0.999713 0.999506 0.999918 0.999882 0.999059 0.999476 0.999484 0.999233 0.99919 0.99938 0.999159 MultiLayer Perceptron 75% 0.814048 0.784857 0.721055 0.874235 0.862413 0.886029 0.771111 0.865391 0.705124 0.691349 0.499997 0.749073 0.922811 0.877703 0.94133 0.999809 Naive Bayes 75% 0.996808 0.999245 0.999359 0.999621 0.999204 0.999875 0.999386 0.999662 0.999669 0.999612 0.999793 0.999391 0.994655 0.999735 0.999198 0.999793 Radial SVM 75% 0.999045 0.999346 0.99922 0.999695 0.99945 0.99949 0.999196 0.999294 0.999947 0.999307 0.999305 0.99924 0.999083 0.999832 0.999797 0.999875 Random Forest 75% 0.999047 0.999311 0.999761 0.999435 0.999838 0.999151 0.999582 0.999752 0.999167 0.999303 0.999358 0.999435 0.999298 0.99946 0.999775 0.999774 Recurrent Neural Network 75% 0.999509 0.995891 0.999457 0.999966 0.999736 0.999566 0.999457 0.999744 0.999188 0.999329 0.999885 0.999396 0.999234 0.993499 0.999402 0.999952 XGBoost 75% 0.963176 0.851563 0.999366 0.999264 0.996173 0.980883 0.999301 0.999742 0.940213 0.769226 0.999322 0.999115 0.984611 0.935451 0.999629 0.99904 Artificial Neural Network 60% 0.499215 0.781971 0.499523 0.499228 0.499832 0.743109 0.585856 0.499011 0 0.84585 0 0 0.99996 0.717782 0.999796 0.999158 Decision Tree 60% 0.797298 0.944537 0.984618 0.999802 0.797424 0.941666 0.98515 0.999969 0.705169 0.922493 0.999551 0.999879 0.889208 0.967187 0.970395 0.999935 KNN 60% 0.869757 0.842911 0.744585 0.999425 0.940335 0.969005 0.97497 0.999616 0.764255 0.691783 0.499178 0.9997 0.975883 0.992987 0.990308 0.999121 Linear SVM 60% 0.99931 0.999599 0.999322 0.99997 0.999179 0.999266 0.999279 0.999656 0.999269 0.999624 0.999756 0.999826 0.999748 0.999936 0.999096 0.999955 MultiLayer Perceptron 60% 0.813606 0.784915 0.720435 0.874125 0.861723 0.886492 0.771027 0.864736 0.704992 0.691884 0.4997 0.749833 0.922151 0.877629 0.942302 0.999622 Naive Bayes 60% 0.996768 0.999578 0.999817 0.999045 0.999533 0.999791 0.999128 0.999235 0.999934 0.999803 0.999605 0.999018 0.994492 0.999051 0.999556 0.999571 Radial SVM 60% 0.999833 0.999218 0.999187 0.999725 0.999011 0.999552 0.999209 0.999965 0.999299 0.999294 0.999378 0.999913 0.999423 0.999744 0.999926 0.999264 Random Forest 60% 0.999022 0.99977 0.999303 0.999086 0.99944 0.999898 0.999995 0.999482 0.99957 0.999764 0.999567 0.999768 0.999878 0.999329 0.999071 0.999833 Recurrent Neural Network 60% 0.999391 0.996461 0.999476 0.999807 0.999273 0.999192 0.999725 0.999741 0.999244 0.999631 0.999246 0.999701 0.999339 0.9935 0.99971 0.999608 XGBoost 60% 0.963001 0.851993 0.999684 0.99954 0.996101 0.981595 0.999302 0.999948 0.941012 0.768511 0.99906 0.999696 0.984998 0.935155 0.999108 0.999629 Artificial Neural Network 50% 0.499385 0.781076 0.499718 0.499197 0.499023 0.743611 0.586 0.499233 0 0.845493 0 0 0.999884 0.717039 0.999012 0.999956 Decision Tree 50% 0.796804 0.945383 0.985173 0.999637 0.797544 0.941572 0.984613 0.999156 0.705672 0.922952 0.999972 0.999999 0.888463 0.967479 0.970976 0.999938 KNN 50% 0.869417 0.84256 0.7442 0.999293 0.940512 0.968442 0.974467 0.999628 0.763798 0.691841 0.499174 0.999213 0.975325 0.993541 0.989802 0.999528 Linear SVM 50% 0.999499 0.999385 0.999392 0.999841 0.999018 0.999713 0.999166 0.999664 0.999394 0.999344 0.999782 0.999075 0.99982 0.999819 0.999754 0.9991 MultiLayer Perceptron 50% 0.814418 0.784734 0.720474 0.874616 0.86172 0.885966 0.771069 0.864608 0.70533 0.691984 0.499568 0.749154 0.92303 0.878193 0.941599 0.99916 Naive Bayes 50% 0.997377 0.999828 0.999858 0.999739 0.999366 0.999129 0.999403 0.999078 0.999602 0.999422 0.999104 0.999751 0.995137 0.99992 0.99957 0.999925 Radial SVM 50% 0.999383 0.999033 0.999909 0.999985 0.999776 0.999493 0.999172 0.999247 0.999919 0.999546 0.999417 0.999401 0.99908 0.999702 0.999824 0.999258 Random Forest 50% 0.999684 0.999117 0.99912 0.999882 0.999553 0.999509 0.999886 0.999795 0.999592 0.999991 0.99964 0.999399 0.999283 0.999616 0.999402 0.999568 Recurrent Neural Network 50% 0.999222 0.996331 0.99969 0.999193 0.999595 0.999886 0.999569 0.999639 0.999311 0.999206 0.999758 0.999474 0.99922 0.993376 0.99976 0.999186 XGBoost 50% 0.962444 0.852301 0.999507 0.999097 0.995845 0.981093 0.999819 0.999523 0.940976 0.769185 0.99974 0.999728 0.985443 0.935261 0.999578 0.999532

5.2.7. Embedded Method with Logistic Regression Feature Selection

Table 5.11 shows the results of the models for the Embedded Method with Logistic Regression feature selection method by selecting different percentages of features and training the models on 90%, 80%, 70% and 60% data. The following observations can be made from Table 5.11.

1. The highest AUC and Accuracy values are obtained by training the model with 70%, 80% and 90% training data in most of the cases.

2. The highest accuracy of 100% was achieved for Embedded Method with Logistic Regression based feature selection method with Linear and Radial SVM, Naïve Bayes, Random Forest, RNN and XGBoost for 100% of the features selected from all data, while all the models performed well with a training to testing ratio of 70:30, 80:20 and 90:10.

3. An AUC of 1.00 was obtained by the above classifiers. These classifiers also performed well for 90%, 75%, 60% and 50% features.

Table 5.11 Model Comparison based on Embedded Method with Logistic Regression Feature Selection method

Accuracy Accuracy Accuracy Accuracy AUC AUC AUC AUC Senstivity Senstivity Senstivity Senstivity Specificity Specificity Specificity Specificity

split 60-40 70-30 80-20 90-10 60-40 70-30 80-20 90-10 60-40 70-30 80-20 90-10 60-40 70-30 80-20 90-10

Model Features

Artificial Neural Network 100% 0.5 0.5 0.5 0.5 0.5385 0.4514 0.7452 0.6154 0 0 0 0 1 1 1 1

Decision Tree 100% 0.95192 0.94231 0.98558 0.97115 0.9519 0.9423 0.9856 0.9712 1 1 1 1 0.90385 0.88462 0.97115 0.94231

KNN 100% 0.92492 0.98077 0.98558 0.99038 0.9864 0.9941 0.994 1 0.94118 1 1 1 0.90865 0.96154 0.97115 0.98077

Linear SVM 100% 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

MultiLayer Perceptron 100% 0.5 0.79487 0.5 0.5 0.6288 0.9026 0.5144 0.4808 0 0.84615 0 0 1 0.74359 1 1

Naive Bayes 100% 0.9976 1 1 1 1 1 1 1 1 1 1 1 0.99519 1 1 1

Radial SVM 100% 0.9976 1 1 1 0.9997 1 1 1 1 1 1 1 0.99519 1 1 1

Random Forest 100% 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Recurrent Neural Network 100% 1 0.98077 0.99519 1 1 0.9808 1 1 1 1 1 1 1 0.96154 0.99038 1

XGBoost 100% 0.89975 0.86538 0.99038 1 0.9887 0.9749 0.9988 1 0.82353 0.76923 1 1 0.97596 0.96154 0.98077 1

Artificial Neural Network 90% 0.499886 0.499847 0.499604 0.499109 0.537957 0.451385 0.745067 0.614985 0 0 0 0 0.999292 0.99972 0.999529 0.999287 Decision Tree 90% 0.951904 0.941619 0.984685 0.970787 0.951169 0.942217 0.985204 0.970706 0.999336 0.999438 0.999805 0.999753 0.903648 0.883749 0.970539 0.941778 KNN 90% 0.924823 0.980407 0.984933 0.990361 0.985676 0.99345 0.993758 0.99968 0.940262 0.999299 0.999842 0.99979 0.908123 0.960671 0.970647 0.980056 Linear SVM 90% 0.999636 0.999713 0.999768 0.999723 0.999682 0.999639 0.999944 0.999007 0.999429 0.999287 0.999506 0.999914 0.999165 0.999929 0.999031 0.999877 MultiLayer Perceptron 90% 0.499552 0.794618 0.49952 0.499882 0.628593 0.901808 0.513632 0.480566 0 0.846025 0 0 0.999665 0.743496 0.999036 0.999945 Naive Bayes 90% 0.996831 0.999107 0.99918 0.999757 0.999075 0.999113 0.999878 0.999804 0.99989 0.999258 0.999732 0.999958 0.995027 0.999611 0.999769 0.999371 Radial SVM 90% 0.997515 0.999955 0.999696 0.99996 0.998876 0.999174 0.999458 0.999936 0.999929 0.999387 0.999091 0.999294 0.994494 0.999896 0.999721 0.999245 Random Forest 90% 0.999402 0.999931 0.999466 0.999196 0.999741 0.99937 0.99914 0.999886 0.999071 0.999799 0.999525 0.99981 0.999674 0.999798 0.999567 0.99994 Recurrent Neural Network 90% 0.999558 0.979951 0.994922 0.999374 0.999652 0.980325 0.999868 0.999427 0.999573 0.999389 0.999014 0.999547 0.999171 0.960566 0.989773 0.999303 XGBoost 90% 0.89922 0.865238 0.989707 0.999064 0.987995 0.973939 0.998376 0.999113 0.822674 0.768465 0.999379 0.999162 0.97512 0.961393 0.979789 0.999674 Artificial Neural Network 75% 0.499295 0.499177 0.499304 0.499171 0.537907 0.450933 0.745076 0.615135 0 0 0 0 0.999699 0.999105 0.999629 0.999275 Decision Tree 75% 0.951207 0.941526 0.985189 0.970633 0.951696 0.941691 0.984952 0.971014 0.999151 0.999613 0.999914 0.999551 0.903198 0.884264 0.970667 0.941801 KNN 75% 0.924547 0.980377 0.984807 0.990221 0.985402 0.993935 0.993975 0.999826 0.940796 0.999224 0.999113 0.9999 0.908232 0.961368 0.970183 0.98016 Linear SVM 75% 0.999756 0.99954 0.999539 0.999368 0.999405 0.999349 0.999979 0.999131 0.9996 0.999868 0.999412 0.999248 0.999195 0.999421 0.999234 0.999806 MultiLayer Perceptron 75% 0.499258 0.794324 0.499849 0.499009 0.628645 0.902587 0.51355 0.480685 0 0.845763 0 0 0.999633 0.743348 0.999027 0.999573 Naive Bayes 75% 0.996713 0.999783 0.999909 0.999611 0.999725 0.999566 0.999859 0.999417 0.999041 0.999563 0.999718 0.999125 0.994957 0.999878 0.999307 0.99988 Radial SVM 75% 0.997395 0.999648 0.999922 0.999715 0.998868 0.999966 0.999535 0.999817 0.9995 0.999954 0.999804 0.999407 0.994266 0.999124 0.999472 0.999576 Random Forest 75% 0.999604 0.999332 0.999168 0.999689 0.999987 0.999316 0.999642 0.999533 0.999803 0.999641 0.999215 0.999875 0.999279 0.999992 0.999463 0.999853 Recurrent Neural Network 75% 0.999363 0.980587 0.994288 0.999033 0.999966 0.979968 0.999463 0.9991 0.999927 0.999893 0.999632 0.999327 0.999042 0.961528 0.990166 0.999538 XGBoost 75% 0.899056 0.865021 0.989619 0.999418 0.988231 0.974275 0.998213 0.999408 0.823118 0.768406 0.999827 0.999882 0.975062 0.960642 0.980581 0.999049 Artificial Neural Network 60% 0.499552 0.499222 0.499903 0.499387 0.538419 0.450628 0.744697 0.615174 0 0 0 0 0.999506 0.999494 0.999488 0.999101 Decision Tree 60% 0.951893 0.942211 0.985027 0.971002 0.951394 0.941635 0.985025 0.970845 0.999359 0.999715 0.999582 0.999863 0.903649 0.884467 0.970878 0.941686 KNN 60% 0.923988 0.980765 0.984828 0.989859 0.986237 0.993746 0.993548 0.99943 0.940782 0.999808 0.999912 0.999476 0.90861 0.961279 0.970313 0.980745 Linear SVM 60% 0.999739 0.999041 0.99909 0.999001 0.99928 0.999415 0.999433 0.999136 0.999422 0.999819 0.999604 0.999922 0.999455 0.999335 0.999652 0.999242 MultiLayer Perceptron 60% 0.49977 0.794214 0.499591 0.499137 0.628494 0.902274 0.514137 0.479816 0 0.84543 0 0 0.999774 0.743352 0.99977 0.999378 Naive Bayes 60% 0.997255 0.999965 0.999132 0.999591 0.999067 0.999485 0.99983 0.999165 0.999753 0.999511 0.999977 0.999811 0.994825 0.999305 0.999557 0.999423 Radial SVM 60% 0.997524 0.999978 0.999456 0.999409 0.999134 0.999334 0.999931 0.999093 0.999218 0.999116 0.999418 0.999631 0.994507 0.999538 0.999976 0.999214 Random Forest 60% 0.999012 0.999151 0.999794 0.999019 0.999497 0.99916 0.999131 0.999764 0.999534 0.999458 0.999043 0.999006 0.999945 0.999663 0.999001 0.999542 Recurrent Neural Network 60% 0.999729 0.980662 0.994728 0.999274 0.999749 0.98001 0.999052 0.999346 0.999567 0.999469 0.999063 0.999722 0.999475 0.960669 0.9899 0.999038 XGBoost 60% 0.89943 0.864947 0.990084 0.999001 0.988658 0.974333 0.998145 0.999773 0.823162 0.769176 0.999058 0.999226 0.97538 0.960951 0.980619 0.99929 Artificial Neural Network 50% 0.499777 0.499879 0.499339 0.499337 0.538132 0.451049 0.7448 0.615325 0 0 0 0 0.999776 0.999805 0.999414 0.999579 Decision Tree 50% 0.951824 0.942166 0.985064 0.970962 0.951068 0.942141 0.985038 0.970667 0.999915 0.999888 0.999418 0.999384 0.902938 0.88417 0.9709 0.941687 KNN 50% 0.92437 0.980611 0.985126 0.990335 0.985578 0.994016 0.993746 0.999137 0.94107 0.999275 0.999199 0.999808 0.908526 0.961303 0.970558 0.980088 Linear SVM 50% 0.999604 0.999145 0.999655 0.99965 0.999299 0.999251 0.99963 0.999673 0.999332 0.999448 0.999395 0.999726 0.999784 0.999169 0.999906 0.999268 MultiLayer Perceptron 50% 0.499798 0.794156 0.499412 0.499703 0.628177 0.901848 0.513599 0.4804 0 0.845316 0 0 0.999448 0.743214 0.999475 0.999263 Naive Bayes 50% 0.996792 0.999158 0.999862 0.999351 0.999753 0.99923 0.999011 0.999029 0.999722 0.999493 0.99967 0.999428 0.995174 0.99996 0.999838 0.999479 Radial SVM 50% 0.997148 0.999855 0.999353 0.999536 0.999669 0.999731 0.99978 0.999303 0.999741 0.999706 0.999967 0.999494 0.994327 0.999993 0.999152 0.999105 Random Forest 50% 0.999562 0.999958 0.999094 0.99919 0.999006 0.999819 0.999362 0.999275 0.999027 0.999193 0.99908 0.999588 0.999433 0.999822 0.999265 0.999635 Recurrent Neural Network 50% 0.99948 0.980751 0.994211 0.999948 0.99937 0.980348 0.999642 0.999608 0.999767 0.999269 0.999172 0.999123 0.999853 0.961245 0.989809 0.999164 XGBoost 50% 0.898797 0.865199 0.990007 0.999106 0.987808 0.974037 0.997999 0.999851 0.823076 0.768619 0.999824 0.999906 0.975869 0.961164 0.979953 0.999799

5.2.8. Embedded Method with Random Forest Feature Selection

Table 5.12 shows the results of the models for the Embedded Method with Random Forest feature selection method by selecting different percentages of features and training the models on 90%, 80%, 70% and 60% data. The following observations can be made from Table 5.12.

1. The highest AUC and Accuracy values are obtained by training the model with 70%, 80% and 90% training data in most of the cases.

2. The highest accuracy of 100% and AUC of 1.00 was achieved for Select from Embedded Method with Random Forest based feature selection method is obtained 100% of the features selected from all data, while all the models performed well with a training to testing ratio of 70:30, 80:20 and 90:10 except ANN.

Table 5.12 Model Comparison based on Embedded Method with Random Forest Feature Selection method

Accuracy Accuracy Accuracy Accuracy AUC AUC AUC AUC Senstivity Senstivity Senstivity Senstivity Specificity Specificity Specificity Specificity

split 60-40 70-30 80-20 90-10 60-40 70-30 80-20 90-10 60-40 70-30 80-20 90-10 60-40 70-30 80-20 90-10

Model Features

Artificial Neural Network 100% 0.5 0.5 0.5 0.5 0.6803 0.5602 0.6184 0.6803 0 0 0 0 1 1 1 1

Decision Tree 100% 0.97115 0.91987 0.99038 0.97115 0.9712 0.9199 0.9904 0.9712 1 0.84615 1 1 0.94231 0.99359 0.98077 0.94231

KNN 100% 0.97115 0.99359 0.99038 0.97115 0.9904 0.9953 1 0.9904 1 1 1 1 0.94231 0.98718 0.98077 0.94231

Linear SVM 100% 0.99038 1 0.99519 0.99038 0.9856 1 1 0.9856 1 1 1 1 0.98077 1 0.99038 0.98077

MultiLayer Perceptron 100% 0.5 0.77244 0.81731 0.5 0.6514 0.9029 0.9345 0.6514 0.5 0.69231 0.75 0.5 0.5 0.85256 0.88462 0.5

Naive Bayes 100% 1 0.95833 0.93269 1 1 0.9613 0.9327 1 1 0.92308 0.875 1 1 0.99359 0.99038 1

Radial SVM 100% 0.5 0.84615 0.875 0.5 0.5 0.946 0.9591 0.5 1 0.69231 0.75 1 0 1 1 0

Random Forest 100% 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Recurrent Neural Network 100% 0.99038 0.99038 0.98558 0.99038 1 0.9998 1 1 1 1 1 1 0.98077 0.98077 0.97115 0.98077

XGBoost 100% 1 0.92308 0.93269 1 1 0.998 0.9964 1 1 0.84615 0.875 1 1 1 0.99038 1

Artificial Neural Network 90% 0.499031 0.499832 0.499001 0.499461 0.67998 0.559326 0.618073 0.679592 0 0 0 0 0.999395 0.999326 0.999751 0.999219 Decision Tree 90% 0.97018 0.918967 0.990191 0.970528 0.970249 0.918928 0.989663 0.970251 0.999962 0.845207 0.999856 0.999792 0.941405 0.993235 0.979847 0.94223 KNN 90% 0.970835 0.993485 0.990101 0.970631 0.989988 0.994342 0.999703 0.98966 0.999065 0.99936 0.999766 0.999239 0.942196 0.986867 0.980217 0.941439 Linear SVM 90% 0.989677 0.999651 0.99467 0.989861 0.98501 0.999529 0.999274 0.984633 0.999799 0.999643 0.999882 0.999168 0.980729 0.999047 0.989646 0.980633 MultiLayer Perceptron 90% 0.499078 0.771691 0.816564 0.499027 0.650578 0.902859 0.933814 0.651151 0.499913 0.691501 0.749102 0.499013 0.499533 0.851805 0.883948 0.499027 Naive Bayes 90% 0.99961 0.95773 0.931697 0.999125 0.999706 0.961059 0.93203 0.999436 0.999526 0.922127 0.874265 0.999777 0.999335 0.993251 0.990303 0.999991 Radial SVM 90% 0.499506 0.845355 0.874613 0.499295 0.499172 0.945622 0.958378 0.499275 0.999295 0.691348 0.749603 0.999486 0 0.999571 0.999861 0 Random Forest 90% 0.999487 0.999586 0.999487 0.999054 0.999575 0.999444 0.99914 0.999031 0.999936 0.999405 0.999756 0.999245 0.999481 0.999363 0.999295 0.999705 Recurrent Neural Network 90% 0.989495 0.99022 0.985469 0.989741 0.99987 0.99915 0.999001 0.999376 0.999249 0.999459 0.999801 0.999028 0.980466 0.980574 0.970263 0.980236 XGBoost 90% 0.999256 0.922457 0.932048 0.999012 0.99991 0.997982 0.995556 0.999329 0.999946 0.845865 0.874843 0.999293 0.999754 0.999659 0.990057 0.999457 Artificial Neural Network 75% 0.499893 0.499231 0.499099 0.499909 0.679417 0.559264 0.617631 0.679385 0 0 0 0 0.999029 0.999093 0.999848 0.999863 Decision Tree 75% 0.97046 0.919213 0.990105 0.970159 0.971001 0.919339 0.989529 0.970747 0.99983 0.845942 0.999863 0.999355 0.941342 0.993294 0.980345 0.941788 KNN 75% 0.971046 0.992867 0.990322 0.970907 0.989491 0.994315 0.999164 0.990283 0.999179 0.999813 0.999264 0.999055 0.941732 0.987066 0.979804 0.941567 Linear SVM 75% 0.989866 0.999997 0.995071 0.989381 0.98544 0.999353 0.999984 0.985489 0.99967 0.999833 0.999672 0.999229 0.980034 0.999285 0.989729 0.98041 MultiLayer Perceptron 75% 0.499024 0.771745 0.816688 0.499692 0.6506 0.902207 0.934122 0.650705 0.499064 0.692225 0.749532 0.499569 0.499357 0.852221 0.884618 0.499955 Naive Bayes 75% 0.99924 0.957795 0.932043 0.999185 0.999894 0.961012 0.93192 0.999068 0.999386 0.922436 0.874563 0.999505 0.999715 0.993333 0.989647 0.999797 Radial SVM 75% 0.499715 0.846026 0.87471 0.499568 0.499468 0.945306 0.95879 0.499046 0.999967 0.69168 0.749035 0.99921 0 0.999845 0.999189 0 Random Forest 75% 0.999444 0.999311 0.999487 0.999151 0.999784 0.999021 0.999221 0.999461 0.999201 0.999912 0.99958 0.999566 0.999153 0.999372 0.999309 0.999276 Recurrent Neural Network 75% 0.98942 0.989514 0.985503 0.989533 0.999986 0.999064 0.999909 0.999264 0.999048 0.99958 0.999686 0.999585 0.980259 0.980048 0.970496 0.980113 XGBoost 75% 0.999042 0.922412 0.931736 0.999563 0.999455 0.997538 0.996159 0.999655 0.999364 0.845685 0.874939 0.999514 0.999364 0.99965 0.990086 0.99926 Artificial Neural Network 60% 0.499191 0.49996 0.499679 0.499215 0.679919 0.559674 0.617981 0.679971 0 0 0 0 0.999808 0.999246 0.999542 0.99982 Decision Tree 60% 0.970911 0.919553 0.990062 0.971033 0.970551 0.919801 0.990106 0.970261 0.999351 0.846148 0.999587 0.999557 0.941684 0.99319 0.980711 0.94141 KNN 60% 0.970378 0.993409 0.989428 0.970455 0.989751 0.994359 0.999879 0.990171 0.999938 0.999804 0.999985 0.999897 0.941378 0.98684 0.980291 0.941428 Linear SVM 60% 0.989705 0.999098 0.994252 0.990116 0.985402 0.999048 0.999362 0.985462 0.999266 0.99921 0.999074 0.999829 0.980673 0.99924 0.989744 0.979783 MultiLayer Perceptron 60% 0.499382 0.772003 0.816447 0.499356 0.651302 0.902094 0.93361 0.650438 0.499132 0.692126 0.749875 0.499529 0.499593 0.852186 0.883733 0.499746 Naive Bayes 60% 0.999532 0.958286 0.932685 0.999563 0.999992 0.960754 0.932638 0.999837 0.999224 0.922093 0.8747 0.999416 0.99974 0.993221 0.989637 0.999224 Radial SVM 60% 0.499519 0.845883 0.874876 0.499136 0.499721 0.945784 0.95831 0.499221 0.999029 0.692124 0.749491 0.999865 0 0.999685 0.999646 0 Random Forest 60% 0.999877 0.999849 0.99949 0.999074 0.999194 0.999542 0.999284 0.999274 0.99921 0.999524 0.999025 0.999998 0.999465 0.999706 0.9995 0.99923 Recurrent Neural Network 60% 0.989646 0.989718 0.985344 0.989784 0.999457 0.998997 0.999832 0.999592 0.999916 0.999759 0.999754 0.999567 0.980756 0.980517 0.970228 0.980635 XGBoost 60% 0.9996 0.922799 0.932504 0.999136 0.999468 0.997855 0.995757 0.999453 0.999446 0.845696 0.87407 0.999959 0.999177 0.999491 0.989844 0.999598 Artificial Neural Network 50% 0.4999 0.499084 0.499839 0.49996 0.680075 0.55971 0.617527 0.679481 0 0 0 0 0.999333 0.99975 0.999977 0.999609 Decision Tree 50% 0.970619 0.919771 0.989504 0.970588 0.971174 0.919031 0.990062 0.97078 0.999402 0.845421 0.999639 0.999053 0.942161 0.993191 0.980711 0.942255 KNN 50% 0.970453 0.993123 0.989654 0.970326 0.990262 0.994669 0.999664 0.990062 0.999085 0.999178 0.999317 0.999393 0.941644 0.986779 0.980423 0.941658 Linear SVM 50% 0.989931 0.999678 0.994375 0.99033 0.984954 0.999967 0.999386 0.985222 0.999772 0.999883 0.999638 0.999274 0.979916 0.999971 0.990033 0.980104 MultiLayer Perceptron 50% 0.499985 0.772025 0.81677 0.499326 0.650511 0.902788 0.934259 0.651037 0.499231 0.692194 0.749115 0.499455 0.499109 0.852044 0.884399 0.499355 Naive Bayes 50% 0.999234 0.958011 0.931907 0.999203 0.999995 0.960707 0.93263 0.999183 0.999665 0.922816 0.87482 0.999518 0.999758 0.993351 0.99003 0.999145 Radial SVM 50% 0.499005 0.84602 0.874578 0.49972 0.499679 0.945006 0.958466 0.499848 0.999589 0.691895 0.749373 0.999131 0 0.999138 0.999352 0 Random Forest 50% 0.999111 0.9998 0.999407 0.99978 0.999191 0.999427 0.999882 0.999782 0.999534 0.999539 0.999851 0.999324 0.999155 0.999373 0.999515 0.999656 Recurrent Neural Network 50% 0.989541 0.989941 0.985293 0.990067 0.999543 0.999488 0.999052 0.999237 0.999612 0.99946 0.999348 0.999925 0.979798 0.980143 0.970915 0.980755 XGBoost 50% 0.999104 0.922931 0.932112 0.999047 0.99981 0.997927 0.996278 0.999283 0.99909 0.845616 0.874022 0.999289 0.999746 0.999579 0.99015 0.999493

5.2.9. SelectK Feature Selection

Table 5.13 shows the results of the models for the SelectK feature selection method by selecting different percentages of features and training the models on 90%, 80%, 70% and 60% data. The following observations can be made from Table 5.13.

1. The highest AUC and Accuracy values are obtained by training the model with 70%, 80% and 90% training data in most of the cases.

2. The highest accuracy of 100% is obtained for SelectK feature selection method with Linear SVM, Naïve Bayes, Recurrent Neural Network, Random Forest and XGBoost for 100% of the features selected from all data.

3. An AUC of 1.00 is obtained by the above classifiers. These algorithms also perform well with 90%, 75%, 60% and 50% features. Almost all the models performed well with a training to testing ratio of 70:30, 80:20 and 90:10.

Table 5.13 Model Comparison based on SelectK Feature Selection with different data split ratios

Accuracy Accuracy Accuracy Accuracy AUC AUC AUC AUC Senstivity Senstivity Senstivity Senstivity Specificity Specificity Specificity Specificity

split 60-40 70-30 80-20 90-10 60-40 70-30 80-20 90-10 60-40 70-30 80-20 90-10 60-40 70-30 80-20 90-10

Model Features

Artificial Neural Network 100% 0.5 0.5 0.5 0.5 0.6971 0.5 0.613 0.6971 0 0 0 0 1 1 1 1

Linear SVM 100% 0.99279 0.99679 1 1 0.9986 1 1 1 1 1 1 1 0.98558 0.99359 1 1

MultiLayer Perceptron 100% 0.70136 0.487179 0.5 0.625 0.7052 0.5747 0.4038 0.8173 0.94118 0.076923 0 0.5 0.46154 0.897436 1 0.75

Naive Bayes 100% 0.95192 0.99679 0.99038 1 0.9639 0.9963 0.9934 1 1 1 1 1 0.90385 0.99359 0.98077 1

Radial SVM 100% 0.98077 0.99359 0.99038 0.99038 0.9989 1 1 1 1 1 1 1 0.96154 0.98718 0.98077 0.98077

Random Forest 100% 0.9976 1 1 1 1 1 1 1 1 1 1 1 0.99519 1 1 1

Recurrent Neural Network 100% 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

XGBoost 100% 0.92916 0.88462 0.99519 1 0.9884 0.9921 1 1 0.88235 0.76923 1 1 0.97596 1 0.99038 1

Artificial Neural Network 90% 0.499671 0.499279 0.499619 0.499729 0.696395 0.4995 0.612395 0.696411 0 0 0 0 0.999467 0.999232 0.999006 0.999505 Decision Tree 90% 0.951206 0.95828 0.97563 0.97038 0.951606 0.957528 0.975788 0.971057 0.99943 0.999562 0.999146 0.999317 0.903284 0.916386 0.951094 0.941837 KNN 90% 0.882144 0.909457 0.855465 0.970317 0.974511 0.991126 0.981887 0.994831 0.823446 0.846103 0.749037 0.999866 0.941496 0.973778 0.961283 0.941325 Linear SVM 90% 0.992633 0.996128 0.999422 0.999557 0.997793 0.999005 0.999341 0.999353 0.999391 0.999703 0.999015 0.999642 0.984626 0.992951 0.9995 0.999588 MultiLayer Perceptron 90% 0.700545 0.48655 0.499692 0.624946 0.704397 0.574222 0.403679 0.816682 0.941032 0.076019 0 0.499786 0.460943 0.896668 0.999818 0.749123 Naive Bayes 90% 0.951868 0.99617 0.989386 0.999557 0.963284 0.996131 0.993064 0.999306 0.999802 0.999536 0.999563 0.99983 0.903201 0.993416 0.980456 0.999351 Radial SVM 90% 0.980708 0.992619 0.990143 0.990201 0.998331 0.999533 0.999836 0.999314 0.999366 0.999756 0.999902 0.99986 0.960973 0.986658 0.980159 0.980207 Random Forest 90% 0.997165 0.99972 0.999811 0.99955 0.999845 0.999651 0.999228 0.999247 0.999812 0.999867 0.999511 0.999569 0.995093 0.999286 0.999331 0.999473 Recurrent Neural Network 90% 0.999082 0.999108 0.999934 0.999646 0.999514 0.999558 0.999409 0.999837 0.999044 0.999326 0.999758 0.999326 0.999241 0.999013 0.999558 0.999205 XGBoost 90% 0.929065 0.883949 0.994314 0.99989 0.988335 0.991176 0.99915 0.999515 0.882227 0.768533 0.999237 0.999939 0.975155 0.999921 0.989498 0.999094 Artificial Neural Network 75% 0.499551 0.49936 0.499275 0.4993 0.696803 0.499154 0.612244 0.697003 0 0 0 0 0.999915 0.999172 0.999919 0.999105 Decision Tree 75% 0.951522 0.958183 0.975104 0.97074 0.951268 0.958299 0.975055 0.970693 0.999232 0.999754 0.999944 0.99952 0.903601 0.916252 0.951689 0.941597 KNN 75% 0.88223 0.909537 0.854842 0.970218 0.974114 0.991526 0.981419 0.994397 0.823395 0.845683 0.74961 0.999188 0.941976 0.973689 0.961366 0.942062 Linear SVM 75% 0.992432 0.996368 0.999816 0.999286 0.99782 0.999378 0.999191 0.999831 0.999352 0.99999 0.999156 0.999428 0.985221 0.992708 0.999562 0.999323 MultiLayer Perceptron 75% 0.700731 0.486675 0.499402 0.624782 0.704676 0.573943 0.40309 0.816682 0.940962 0.076819 0 0.499943 0.461099 0.897223 0.99952 0.749449 Naive Bayes 75% 0.951871 0.995865 0.990337 0.999941 0.963056 0.996025 0.9934 0.999622 0.999432 0.999975 0.999212 0.99932 0.903313 0.992924 0.980749 0.999366 Radial SVM 75% 0.980409 0.993105 0.989996 0.989838 0.998248 0.999789 0.999019 0.99957 0.999077 0.999097 0.999659 0.999585 0.961385 0.987052 0.980761 0.980436 Random Forest 75% 0.996703 0.99996 0.999429 0.999803 0.999645 0.999795 0.999553 0.999685 0.999427 0.999634 0.999275 0.999028 0.995012 0.999999 0.999502 0.999507 Recurrent Neural Network 75% 0.999635 0.999045 0.999657 0.99908 0.999706 0.999307 0.999168 0.999903 0.999638 0.999434 0.999446 0.999792 0.999839 0.999375 0.999422 0.999183 XGBoost 75% 0.928408 0.884557 0.99436 0.999405 0.987503 0.991186 0.999913 0.999273 0.882292 0.768973 0.999227 0.999729 0.975333 0.999456 0.989883 0.999623 Artificial Neural Network 60% 0.499344 0.499527 0.499039 0.499206 0.69622 0.499947 0.61224 0.696786 0 0 0 0 0.999226 0.999754 0.999813 0.999924 Decision Tree 60% 0.951424 0.958193 0.975008 0.97077 0.951656 0.957393 0.975792 0.970962 0.999982 0.999133 0.999675 0.999255 0.903775 0.916445 0.951381 0.941878 KNN 60% 0.882495 0.909387 0.855178 0.970191 0.974436 0.990921 0.981927 0.995011 0.8234 0.845924 0.749742 0.999641 0.942224 0.973576 0.961289 0.941637 Linear SVM 60% 0.992273 0.996725 0.999766 0.999951 0.997841 0.999995 0.999944 0.999453 0.999668 0.99967 0.999569 0.999176 0.985301 0.992851 0.999806 0.999571 MultiLayer Perceptron 60% 0.701086 0.486979 0.49974 0.624036 0.705007 0.573975 0.403464 0.817211 0.941093 0.076732 0 0.499744 0.461099 0.896833 0.999664 0.749058 Naive Bayes 60% 0.951739 0.996181 0.989848 0.999567 0.963724 0.995359 0.993115 0.999037 0.999415 0.999184 0.999119 0.999677 0.903716 0.99272 0.979877 0.999236 Radial SVM 60% 0.98 0.993056 0.989975 0.990016 0.998493 0.999742 0.999293 0.999405 0.999766 0.999793 0.999028 0.999292 0.960591 0.986788 0.980213 0.979784 Random Forest 60% 0.996658 0.999502 0.999822 0.999866 0.999981 0.999991 0.999194 0.999586 0.999772 0.999429 0.999987 0.999114 0.995064 0.999743 0.999919 0.999316 Recurrent Neural Network 60% 0.999788 0.999938 0.999263 0.999525 0.999865 0.999547 0.999026 0.999068 0.999962 0.999602 0.999968 0.999966 0.999035 0.999147 0.999583 0.999001 XGBoost 60% 0.929151 0.884123 0.994253 0.999784 0.987877 0.99161 0.999579 0.99958 0.882215 0.768804 0.999484 0.999479 0.975454 0.999352 0.989603 0.99988 Artificial Neural Network 50% 0.499742 0.499006 0.499918 0.499802 0.697024 0.499322 0.612874 0.697071 0 0 0 0 0.999965 0.999165 0.999655 0.999375 Decision Tree 50% 0.951693 0.958116 0.975942 0.970758 0.950906 0.957821 0.975575 0.971124 0.999651 0.999418 0.99906 0.999888 0.903831 0.916572 0.951406 0.94176 KNN 50% 0.882037 0.910026 0.854893 0.970887 0.974418 0.991296 0.981354 0.994803 0.822925 0.845339 0.749358 0.999072 0.942233 0.973635 0.960953 0.942234 Linear SVM 50% 0.991815 0.996088 0.999682 0.999819 0.998336 0.999131 0.999965 0.999921 0.999603 0.999911 0.999026 0.999127 0.985495 0.993269 0.999273 0.999372 MultiLayer Perceptron 50% 0.701002 0.486265 0.499019 0.62456 0.704203 0.573902 0.403124 0.816397 0.940352 0.076715 0 0.499654 0.460815 0.89674 0.999371 0.749926 Naive Bayes 50% 0.951735 0.995924 0.990128 0.999894 0.963489 0.995683 0.993226 0.999208 0.999578 0.999349 0.999256 0.999149 0.902923 0.993408 0.979958 0.999699 Radial SVM 50% 0.979789 0.992596 0.989828 0.989417 0.998504 0.999632 0.999195 0.999829 0.999505 0.999641 0.999805 0.999511 0.960575 0.987055 0.980457 0.980728 Random Forest 50% 0.99741 0.999202 0.999095 0.999234 0.999623 0.999738 0.999762 0.99931 0.999368 0.999624 0.99935 0.999639 0.994293 0.999883 0.99941 0.999224 Recurrent Neural Network 50% 0.999992 0.999234 0.999373 0.999786 0.999328 0.999322 0.999564 0.999421 0.999441 0.999987 0.999961 0.999729 0.999786 0.999491 0.999492 0.999969 XGBoost 50% 0.928687 0.884256 0.994671 0.999096 0.988127 0.991217 0.999075 0.999273 0.881974 0.768674 0.999083 0.99926 0.975189 0.999892 0.989775 0.999131

Random Forest, RNN, XGBoost, Linear and Radial SVM gave the best accuracy and AUC scores with most of the feature selection methods.

5.3. Grid-search based Hyper-Parameter tuning of the Models with train-test

Dans le document Predicting head and neck cancer in patients using epigenomics data and advanced machine learning methods (Page 52-74)