Micro Attack Event Approach - Honeypot traces forensics by means of attack event identification

To detect the micro and macro attack events, this approach starts first by iden-tifying individually the micro attack events, and then based on that, detects the macro attack events.

5.3.1 Micro Attack Event Detection

A micro attack event, as defined in Chapter 1, is made of a set of attacking machines, having left the same attack fingerprint, observed over a limited period of time. Our assumption is that attacking sources that are part of the same attack campaign will have a special distribution both in terms of time and space. In our context, we have represented the attack traces by the observed cluster time series.

That is the evolution of the use of a given attack tool seen in one place. In this sense, if attacks of one cluster targeting one place suddenly increase and stop, they should be considered linked to each other. In other words, this would be considered as a micro attack event. We classify them into two classes.

– A set of such micro attack events is detected in the pre-processing step, and they are considered as peaked observed cluster time series. As an example, Figure 5.3 represents an observed cluster time series from the peaked family.

It consists of activities generated by cluster 149315 happening on day 55 on platform 44. These activities are caused by around two hundreds sources. Note that, we only keep peaks at least 50 sources. This is supposed to be a conser-vative threshold to eliminate the background noise.

– Micro attack events may also exist as peaks in the strongly varying observed cluster time series. As an example, Figure 5.4 represents the evolution of cluster 14647 on platform 1 over a period of 150 days. As we can notice, there are three peaks of activities on three different days. Our goal is to have those peaks identified as micro attack events by our approach. To detect them, we proceed

44 5. ON THE IDENTIFICATION OF ATTACK EVENTS

0 50 100 150 200

Time (day)

Number of sources

cluster 149315

Figure5.3 – Peaked cluster time series 149315 on platform 44

as follows. Since the outliers are locally compared to their neighbourhood data items, we first split a long time series Φ into several sub time series s of smaller size. For each sub time series s, we then need to verify if there are outliers in it. Since the values of the outliers, if any, are much more important than those of the others, the standard deviations of time series s with/and without these outliers must be very different. More precisely, on each time series, we remove an amount of ten percent of the number of data points in s to form a filtered time series. The removed items are the most important ones. In our case, we express the difference between two standard deviations by their ratio. If this ratio exceeds a given threshold η, we know that there are the peaks of activities in the sub time series. In this analysis, the length of s is 30. We have tested different values η and experiment shows that value η of 3 gives good results. If the above process concludes that sub time series s indeed contains the outliers, we proceed as follow to detect them. To identify the peaks, we compare the data points of s with a given thresholdδ. Any data points greater thanδare considered as peaks (the method is well-known under the name Peak Picking algorithm). Our problem now lies in how to chooseδ.

One possible solution is based on the average of the population. For instance, the author of [87] has proposed to use the threshold δ as twice the average of population. Experiment shows that it works well especially when the peaks expand in a short time interval. This happens to be the situation we are in.

Applying the Peak Picking algorithm represented in Alg. 1 to the example in Figure 5.4, we obtain three micro attack events on three distinct time intervals [485,485], [561,561], and [575,575], i.e. exactly what we wanted.

5.3.2 Detection of Macro Attack Events

Since the micro attack events detected by the previous method have short life spans (one or two days), we consider that any other micro attack event happening in the same period of time should be seen as being correlated. To build the macro attack events, we just need to group together all the micro attack events happening on the same time interval. As an example, Figure 5.5 represents the evolution of cluster

4500 500 550 600 100

200 300 400

Time(day)

Number of sources Sliding Window

Figure5.4 – Three micro attack events of cluster 14647 observed on platform 1 Alg. 1 : Peak Peaking Function

Input :

X = (X1, X2, ...Xn) : Time Series Output :

IX Index of peaks begin

1 choose the right δ for i=1 to n 2 if Xi > δ

3 I_X(i)←1

4 else

5 IX(i)←0

6 end

end end

0 on platform 1. Applying the Peak Picking technique, we obtain the three similar time intervals [485,485], [561,561], and [575,575] with the ones detected earlier in Figure 5.4. Combining with the previous example in Figure 5.4, we obtain three macro attack events. They all consist of two cluster 0 and 14647 on platform 1, but happen on three distinct time intervals [485,485], [561,561], and [575,575].

We denote the (micro and macro) attack event i as e_i = (Tstart, T_end, S_i) where the (micro and macro) attack event starts at T_start, ends at T_end and S_i contains a set of observed cluster time series identifiers (ci, opi). IfSi is a singleton set, ei is a micro attack event. Otherwise, e_i is a macro attack event, and all Φ[Tstar−Tend],ci,opi

are strongly correlated to each other ∀(c_i, op_i)∈S_i.

Applying to the previous case, we have three macro attack events :e₁ = (485,485, {(14647,1),(0,1)}),e2 = (561,561,{(14647,1),(0,1)}), ande3 = (575,575,{(14647,1), (0,1)}).

46 5. ON THE IDENTIFICATION OF ATTACK EVENTS

4500 500 550 600

100 200 300 400 500 600

Time(day)

Number of sources

Figure5.5 – Three micros attack events of cluster 0 observed on platform 1

Dans le document Honeypot traces forensics by means of attack event identification (Page 81-84)