Results from Two-Phase Selection

EVALUATION AND RESULTS

8.3 Experimental Setup

8.4.3 Results from Two-Phase Selection

Our TPS selects the following features (in no particular order):

Attachment typebinary

MIME (magic) type of attachmentapplication/msdownload

MIME (magic) type of attachment

application/x-ms-dos-executable

Frequency of email sent in window Mean words in body

Mean characters in subject Number of attachments

Number of From Address in Window Ratio of emails with attachment Variance of attachment size Variance of words in body Number of HTML in email Number of links in email

Number of To Address in Window Variance of characters in subject

The first three features actually reflect important characteristics of an infected email. Usually, infected emails have binary attachment, which is a dos/windows executable.

Mean/variance ofwords in bodyandcharacters in subjectare also considered as important symptoms, because usually infected emails contain random subject or body, thus having irregular size of body or subject. Number of attachments,and ratio of emails with attachments, andnumber of links in email are usually higher for infected emails. Frequency of emails sent in window, and number of To Address in window are higher for an infected host, as a compromised host sends infected emails to many addresses and more frequently. Thus, most of the features selected by our algorithm are really practical and useful.

Table 8.6 reports the cross validation accuracy (%) and false positive rate (%) of the three classifiers on the TPS-reduced dataset. We see that both the accuracy and false positive rates are almost the same as the unreduced dataset. The accuracy of Mydoom.M dataset (shown at row M) is 99.3% for NB, 99.5% for SVM, and 99.4% for Series. Table 8.7reports the novel detection accuracy (%) of the three classifiers on the TPS-reduced dataset. We find that the average novel detection accuracy of the TPS-reduced dataset is higher than that of the unreduced dataset. The main reason behind this improvement is the higher accuracy on the Mydoom.M set by NB and Series. The accuracy of NB on this dataset is 37.1%

(row M), compared to 17.4% in the unreduced dataset (see Table 8.4, row M). Also, the accuracy of Series on the same is 36.0%, compared to 16.6% on the unreduced dataset (as show in Table 8.4, row M). However, accuracy of SVM remains almost the same, 91.7%, compared to 92.4% in the unreduced dataset. InTable 8.8, we summarize the averages fromTables 8.3throughTable 8.7.

Table 8.6 Cross Validation Accuracy (%) and False Positive (%) of Three Different Classifiers on the TPS-Reduced Dataset

Source: This table appears in Email Work Detection Using Data Mining, International Journal of Information Security and Privacy, Vol. 1, No. 4, pp. 47–61, 2007, authored by M.

Table 8.7 Comparison of Novel Detection Accuracy (%) of Different Classifiers on the TPS-Reduced Dataset

Source: This table appears in Email Work Detection Using Data Mining, International Journal of Information Security and Privacy, Vol. 1, No. 4, pp. 47–61, 2007, authored by M.

The first three rows (after the header row) report the cross validation accuracy of all four classifiers that we have used in our experiments. Each row reports the average accuracy on a particular dataset. The first row reports the average accuracy for the unreduced dataset; the second row reports the same for PCA-reduced dataset and the third row for TPS-reduced dataset. We see that the average accuracies are almost the same for the TPS-reduced and the unreduced set. For example, average accuracy of NB (shown under column NB) is the same for both, which is 99.2%; the accuracy of SVM (shown under column SVM) is also the same, 99.5%. The average accuracies of these classifiers on the PCA-reduced dataset are 1% to 2% lower. There is no entry under the decision tree column for the PCA-reduced and TPS-reduced dataset because we only test the decision tree on the

Table 8.8 Summary of Results (Averages) Obtained from Different Feature-Based Approaches

Source: This table appears in Email Work Detection Using Data Mining, International Journal of Information Security and Privacy, Vol. 1, No. 4, pp. 47–61, 2007, authored by M.

The middle three rows report the average false positive values and the last three rows report the average novel detection accuracies. We see that the average novel detection accuracy on the TPS-reduced dataset is the highest among all. The average novel detection accuracy of NB on this dataset is 86.7%, compared to 83.6% on the unreduced dataset, which is a 3.1% improvement on average. Also, Series has a novel detection accuracy of 86.3% on the TPS-reduced dataset, compared to that of the unreduced dataset, which is 83.1%.

Again, it is a 3.2% improvement on average. However, average accuracy of SVM remains almost the same (only 0.1% difference) on these two datasets. Thus, on average, we have an improvement in novel detection accuracy across

different classifiers on the TPS-reduced dataset. While TPS-reduced dataset is the best among the three, the best classifier among the four is SVM. It has the highest average accuracy and novel detection accuracy on all datasets, and also very low average false positive rates.

8.5 Summary

In this chapter, we have discussed the results obtained from testing our data mining tool for email worm detection. We first discussed the datasets we used and the experimental setup. Then we described the results we obtained. We have two important findings from our experiments. First, SVM has the best performance among all four different classifiers: NB, SVM, Series, and decision tree. Second, feature selection using our TPS algorithm achieves the best accuracy, especially in detecting novel worms. Combining these two findings, we conclude that SVM with TPS reduction should work as the best novel worm detection tool on a feature-based dataset.

In the future, we would like to extend our work to content-based detection of the email worm by extracting binary level features from the emails. We would also like to apply other classifiers for the detection task.

References

[Chang and Lin, 2006] Chang, C.-C., and C.-J. Lin,LIBSVM:

A Library for Support Vector Machines, http://www.csie.ntu.edu.tw/∼cjlin/libsvm

[Martin et al., 2005] Martin, S., A. Sewani, B. Nelson, K.

Chen, and A. D. Joseph, Analyzing Behavioral Features for Email Classification, in Proceedings of the IEEE Second Conference on Email and Anti-Spam (CEAS 2005),July 21 &

22, Stanford University, CA.

[Weka, 2006] Weka 3: Data Mining Software in Java, http://www.cs.waikato.ac.nz/∼ml/weka

Dans le document IT MANAGEMENT TITLESFROMAUERBACHPUBLICATIONS AND CRC PRESS (Page 193-200)