Explanation - Impact of Observation Viewpoint

5.5 Application and Validation

5.5.3 Impact of Observation Viewpoint

5.5.3.3 Explanation

There are good reasons that explain why we can not rely on a single viewpoint to detect all attacks events. They are described below.

Split by country : Suppose we have one botnet B made of machines that are located within the set of countries {X, Y, Z}. Suppose that, from time to time, these machines attack our platforms leaving traces that are assigned to cluster C.

Suppose also that this cluster C is a very popular one, that is, many other ma-chines from all over the world continuously leave traces on our platforms that are assigned to this cluster. As a result, the activities specifically linked to the bot-net B are lost in the noise of all other machines leaving traces belonging to C.

This is certainly true for the cluster time series (as defined earlier) related to C and this can also be true for the time series obtained by splitting it by platform, Φ[0−800),C,platf ormi∀platf orm_i ∈ 1..40. However, by splitting the time series corres-ponding to clusterCby countries of origins of the sources, then it is quite likely that the time series Φ[0−800),C,countryi∀countryi ∈ {X, Y, Z} will be highly correlated du-ring the periods in which the botnet present in these countries will be active against our platforms. This will lead to the identification of one or several attack events.

Split by platform : Similarly, suppose we have a botnet B^′ made of machines located all over the world. Suppose that, from time to time, these machines attack a specific set of platforms {X, Y, Z} leaving traces that are assigned to a cluster C. Suppose also that this cluster C is a verypopular one, that is, many other ma-chines from all over the world continuously leave traces on all our platforms that are assigned to this cluster. As a result, the activities specifically linked to the bot-net B^′ are lost in the noise of all other machines leaving traces belonging to C.

This is certainly true for the cluster time series (as defined earlier) related to C and this can also be true for the time series obtained by splitting it by countries, Φ[0−800),C,countryi∀countryi ∈ bigcountries. However, by splitting the time series cor-responding to cluster C by platforms attacked, then it is quite likely that the time series Φ[0−800),C,platf ormi∀platf orm_i ∈ {X, Y, Z} will be highly correlated during the periods in which the botnet influences the traces left on the sole platforms concerned by its attack. This will lead to the identification of one or several attack events.

70 5. ON THE IDENTIFICATION OF ATTACK EVENTS The top plot of Figure 5.22 represents the attack event 79. In this case, we see that the traces due to the cluster 175309 are highly correlated when we group them by platform attacked. In fact, there are 9 platforms involved in this case, accounting for a total of 870 sources. If we group the same set of traces by country of origin of the sources, we end up with the bottom curves of Figure 5.22 where the specific attack event identified previously can barely be seen. This highlights the existence of a botnet made of machines located all over the world that target a specific subset of the Internet.

0 2 4 6 8 10 12 14

0 10 20 30 40

0 2 4 6 8 10 12 14

0 50 100 150

Figure5.22 – top plot represents the attack event 79 related to cluster 17309 on 9 platforms. The bottom plot represents the evolution of this cluster by country. Noise of the attacks to other platforms decreases significantly the correlation of observed cluster time series when split by country

5.6 Summary

We have shown that it is possible to automatize the identification of the micro and macro attack events. To achieve this, we have adopted the following key assumption.

The attacking sources acting under the same root cause have a particular distribution in terms of time and geographical location. This leads us to the observation that the attack events exist under form of peaks of activities or groups of correlated attack traces. Since detecting macro attack events requires a huge computational cost, we have discussed and implemented three alternative solutions for this purpose. The solutions are the balance between computational cost and ability to identify the attack events. All solutions were carefully designed to work with large datasets. We have validated our detection techniques on a real dataset collected from a distributed honeypot sensors. And the results conformed our expectations. Besides that, we have

shown the impact of the clustering of the attack traces on our ability to detect micro and macro attack events. Depending on whether we used the origin or the destination of the attacks, we end up identifying different sets of attack events.

72 5. ON THE IDENTIFICATION OF ATTACK EVENTS

Chapter 6 CHARACTERIZATION OF ZOMBIE ARMIES

6.1 Introduction

We have shown in the previous Chapter how to detect the attack events existing in the attack traces collected by a distributed honeypot infrastructure. We have also discussed the impact of the observation viewpoints when detecting them. To justify all these efforts, in this Chapter, we provide diverse lessons that can be learned from the attack event concept. Concretely, in Section 6.2, we classify the attack events into three classes and then analyze them according to several characteristics. As we will show later, botnets are identified as the main cause of the attack events. We show how attack events help in better analyzing the botnets by bringing forward some important characteristics such as the lifetime of botnets, lifetime of infected machines in botnets, the kind of attack tools that infected machines possess. This is presented in Section 6.3.

Dans le document Honeypot traces forensics by means of attack event identification (Page 107-111)