Pre-processing - Honeypot traces forensics by means of attack event identification

Before going to the detailed discussion about the pre-processing technique, we introduce a very important concept, observed cluster time series, in the follo-wing.

Definition 8 An Observed Cluster Time Series ΦT,c,op is a function deﬁned over a period of time T, T being deﬁned as a time interval (in days). That function

40 5. ON THE IDENTIFICATION OF ATTACK EVENTS returns the amount of sources per day associated to the cluster c¹ as seen from the given observation viewpoint op. The observation viewpoint opcan either be a specific platform or a specific country of origin. In the first case, ΦT,c,platf ormX returns, per day, the amount of sources belonging to clustercthat have hitplatf ormX. Similarly, in the second case, ΦT,c,countryX returns, per day, the amount of sources belonging to cluster c that are geographically located in countryX. Clearly, we always have : P∀i∈countries

ΦT,c,i =P∀x∈platf orms

ΦT,c,x

As indicated later, the computational overhead problem comes from the fact that we have too many observed cluster time series. In fact, to learn whether an observed cluster time series is correlated with another observed cluster time series, we must compare that observed cluster time series to all the others. It is evident that the more observed cluster time series we have, the more expensive the computational effort is. As an indication, we have around 400,000 observed cluster time series as of now. Such a high number of observed cluster time series is the main reason for computational overhead. To deal with this, we start with the following observation.

Let us assume that there is an easy way to put the observed cluster time series into different classes, we can distinguish two kinds of correlation as follows :

– within-class correlation :it is to compute the correlation of observed cluster time series belonging to one class.

– cross-class correlation :it is to compute the correlation of observed cluster time series belonging to different classes.

If, by design, the classes we have built are such that there should be no signifi-cant correlation between two observed cluster time series belonging to two different classes, then we will dramatically decrease the computational effort by limiting our-selves to the evaluation of the “within class correlation”. In the following, we discuss how to build these classes.

5.2.1 Assumption

Our assumption is that the observed cluster time series could be classified into different categories based on their level of stability. The observed cluster time series of certain attack tools may vary more than the others. There may be many reasons for that to happen. For instance, several attack processes may exploit the attack tools differently. Or it may be due to the varying of the amount of infected machines (new machines are infected, others get patched). With that in mind, we make the assumption that the observed cluster time series can be classified into the following three categories :

1. Peaked family: observed cluster time series in this family exhibit a significant peak of values during a very small period of one or two days and almost no activity otherwise. The fact that several sources send packets simultaneously to a single platform in a short period of time shows the highly coordinated nature of the attacking machines.

1. The notation of cluster, as used here, is the one defined in Chapter 3, Section 3.3.2.2

2. Stable family : observed cluster time series in this family have a roughly constant behavior during their whole lifetime.

3. Strongly varying family : observed cluster time series in this family are characterized by wide amplitude variations over, possibly very, long periods of time.

There are many advantages with the above classification. Firstly, we do not need to compute the cross-class correlation since the stable observed cluster time series can not be correlated neither with peaked observed cluster time series nor with strongly varying observed cluster time series. Similarly, the observed cluster time series in the strongly varying family are not likely to be correlated with peaked ones. Secondly, since computing the correlation of stable observed cluster time series does not bring much sense, we do not need to compute thewithin-class correlation for the observed cluster time series in this class.

5.2.2 Validation

To validate our above assumptions, for each observed cluster time series, we first identify its active period or lifetime. The lifetime of an observed cluster time series is defined as the period from the first day to the last day during which that observed cluster time series is observed. If the lifetime of one observed cluster time series is only one day, then, we declare that the observed cluster time series belongs to the peaked family. Otherwise, we compute the standard deviation of the observed cluster time series over its lifetime. If it is smaller than a threshold δ, then we flag the observed cluster time series as belonging to the stable family. Otherwise, we filter out the outlier data point from the time series to form the filtered time series.

Outlier data point is defined as the maximum value of the time series. Then we compute the standard deviation of filtered time series. If the standard deviation is now smaller than δ, we declare the time series as being a peaked time series². Otherwise, we declare the observed cluster time series as belonging to the strongly varying family.

Figure 5.1 illustrates the algorithm for an observed cluster time series that spans over 20 days. The standard deviation of the time series is 6.51. Since it is greater than 1, our algorithm does not declare this time series as a stable one. We next filter the extreme values from this time series. This boils down to cutting the peak on day 12. The resulting time series is obviously smoother than the initial one and its standard deviation is 0.46, which is smaller than the threshold 1. Hence, our algorithm eventually flags the time series of Figure 5.1 as belonging to the peaked family.

5.2.2.1 Threshold

We test the approach with a subset 20,756 observed cluster time series extracted from a bigger dataset as presented in Section 5.5. As mentioned earlier, our intention

2. It is different from the first case in the sense that its lifetime is greater than 1, and it has the stable behavior during its whole lifetime besides on a single point

42 5. ON THE IDENTIFICATION OF ATTACK EVENTS

0 5 10 15 20

0 5 10 15 20 25 30

Time(day)

Number of sources

Figure 5.1 – Example of the peaked family time series

is to classify them into three different categories. The problem now is to choose a good threshold δ to separate them. Suppose that there exists an ideal threshold δ^′ that can classify correctly observed cluster time series into three different classes.

For δ ≤δ^′, we may misclassify a stable time series as a peaked or strongly varying time series. In this case, we will spend more computational effort than we could have with δ^′, but we will never miss any correlation (i.e., we will never conclude that two varying time series are not similar whereas they are). For δ≥δ^′, we could consider the strongly time series or peaked time series as stable. In this case, there are less computational effort but we may not detect all the phenomena existing in the dataset. For the sake of completeness, we tend to choose δ ≤ δ^′. Of course we can not visually inspect all the time series to choose the good thresholdδ. We adopt the following heuristic approach, we first choose a random value of δ =v, then we extract the set of time series identified as strongly varying or peaked time series for δ =v, but not when δ = v+ǫ. If the visualization tells us that these observed cluster time series are all stable time series, we know that δ ≤ δ^′, we increase δ by ǫ. Otherwise, we decrease δ by ǫ. We stop the process when δ is stable.

We have chosen ǫ = 0.2, and by applying the heuristic described as above, we end up withδ = 1. It is important to notice that we have tested with smaller value ofǫand it has been shown that the smaller granularity ofǫhas very small impact on the final result. Figure 5.2a shows the evolution of three classes (For the visibility, Figure 5.2b represents the distribution of only peaked and strongly varying time series). The figure tells us that most of time series fall within the stable time series class. Concretely for δ = 1, we have 19564 time series in stable family, 375 time series in the peaked family and 835 time series in the strongly varying family. This result shows the usefulness of the pre-processing step, since most of the cluster time series fall within the stable family for which we do not need to compute neither the cross-classes correlation nor the within-class correlation.

0 1 2 3

Figure 5.2 – Plot a represents the evolution of size of three classes over thre-shold(Plot b represents a two of them for the visibility)

Dans le document Honeypot traces forensics by means of attack event identification (Page 77-81)