• Aucun résultat trouvé

3.2 SpamTracer

3.2.4 Root cause analysis

While the external cross-validation of candidate hijacks described here above should increase our confidence in the existence of BGP spectrum agility spammers in the real world, we wanted to confirm our results by further investigating the root causes of the validated hijacks from a spam campaign perspective. Assuming we could identify good candidate hijacks that are perfectly matching the anomalous routing behavior of BGP spectrum agility spammers, one would expect that spam campaigns launched from these hijacked networks, by the same group of agile spammers, should intuitively share also a number of commonalities with respect to spam features (e.g.,advertised URI’s, sender’s address, etc).

We have thus used a multi-criteria clustering framework called triage [168] to identify series of spam emails sent from different hijacked IP address blocks that seem to be part of a campaign orchestrated by the same agile spammers. triage is a software framework for security data mining that relies on intelligent data fu-sion algorithms to reliably group events or entities likely linked to the same root cause. Thanks to a multi-criteria clustering approach, it can identify complex

pat-Σ

Figure 3.14 – Clustering spam emails sent from hijacked networks usingtriage. terns and varying relationships among groups of events within a dataset. triageis best described as a security tool designed for intelligence extraction and attack inves-tigation, helping analysts to determine the patterns and behaviors of the intruders and typically used to highlight how they operate. This novel clustering approach has demonstrated its utility in the context of other security investigations, e.g., rogue AV campaigns [68], spam botnets [170] and targeted attacks [169].

Figure 6.2 illustrates the triage workflow, as applied to our spam dataset. In step ¬, a number of email characteristics (or features) are selected and defined as decision criteria for linking related spam emails. Such characteristics include the sender IP address, the email subject, the sending date, the advertised URL’s and associated domains and whois registration information. In step ­, triage builds relationships among all email samples with respect to selected features using appro-priate similarity metrics. For text-based features (e.g., subject, email addresses), we used string-oriented similarity measures commonly-used in information retrieval, such as the Levenshtein similarity andN-gram similarity [114]. However, other simi-larity metrics may be defined to match the feature type and be consistent to analyst expectations (e.g., Jaccard to measure similarity between sets, or a custom IP ad-dresses similarity metric that is based on their relative inter-distance in the binary space).

At step®, the individual feature similarities are fused using an aggregation model reflecting a high-level behavior defined by the analyst, who can impose, e.g., that some portion of highly similar email features (out ofnavailable) must be satisfied to assign different samples to the same campaign (regardless of which ones). Similarly to the WOWA aggregation method explained here above, in triage we can assign different weights to individual features, so as to give higher or lower importance to certain features. For this analysis we gave more importance to the source IP addresses,domain names associated to spam URL’s and whois registration names, since we anticipate that a combination of these features convey a sense and possible evidence of colluding spam activities.

As outcome (step¯),triageidentifiesmulti-dimensional clusters(called MDCs), which in this analysis are clusters of spam emails in which any pair of emails is linked by a number of common traits, yet not necessarily always the same. As explained

3.3. Conclusion 65 in [168], a decision threshold can be chosen such that undesired linkage between at-tacks are eliminated,i.e.,to drop any irrelevant connection that could result from a combination of small values or an insufficient number of correlated features.

3.3 Conclusion

In the first part of this chapter we discussed the different types of data,i.e., control plane and data plane routing data, and logs of malicious activities, that can be leveraged to achieve the first goal of this thesis, which is to determine whether malicious BGP hijacks are still occurring today on the Internet. We showed that using a single dataset is not enough to overcome the intrinsic limitations every dataset has, and is likely to produce a lot of false-positive cases. We thus decided to correlate routing data from the control plane and the data plane with malicious activities logs.

In the second part of this chapter, we presentedSpamTracer, a data collection and analysis framework for the large-scale study of malicious BGP hijacks. Based upon the characteristics of the first BGP hijacking spammers involved in “BGP spec-trum agility” uncovered by Ramachandran et al. [148],SpamTracer leverages live BGP data and traceroute measurements to characterise the routing-level behavior of networks originating malicious network traffic, in particular spam emails. Due to the large dataset collected by SpamTracer, we needed a system to automatically extract from our data the networks that exhibit an anomalous routing behavior. We thus designed a multi-stage scoring and data filtering scheme using a multi-criteria decision analysis (MCDA)-based method to aggregate routing anomalies scores ex-tracted from the collected data into a global suspiciousness score. The key objective of this analytical step is to reduce the number of suspicious cases to a set of candidate hijacks that can then be manually validated. This extra validation step using exter-nal data sources is required by the lack of ground-truth data to confirm or not the maliciousness of the uncovered cases. Finally, spam emails issued by networks asso-ciated to validated BGP hijacks are clustered using an external tool calledtriagein order to reveal the modus operandi of the attackers. In the next chapter we present the results of our large scale study of malicious BGP hijacks over a period of 22 months.

4

The malicious BGP hijacks phenomenon

Contents

4.1 Routing data collection results . . . . 68 4.2 Multi-stage scoring and data filtering results . . . . 69 4.3 Validation of candidate hijack results . . . . 71 4.3.1 Long-lived hijacks . . . 84 4.3.2 Short-lived hijacks . . . 86 4.4 Root cause analysis results. . . . 94 4.5 Summary of findings . . . . 96 4.6 Effectiveness of current countermeasures . . . . 98 4.6.1 BGP hijack detection . . . 98 4.6.2 BGP hijack prevention. . . 99 4.7 Operationalizing SpamTracer . . . 100 4.7.1 BGP hijack attacks characteristics . . . 101 4.7.2 Real-time blacklist of hijacked networks . . . 102 4.7.3 Real-world deployment. . . 104 4.8 Conclusion . . . 106 In this chapter1 2, we tackle the second and third research problems of this thesis and provide an answer to the two associated questions: (i) Do cybercriminals use BGP hijacking as a means to steal blocks of IP addresses and use them to perform other nefarious activities? In other words, we want to determine whether the “BGP spectrum agility” phenomenon introduced in 2006 is still a problem worth of consid-eration, as of 2014. (ii) If yes, what is the modus operandi used by these attackers to carry out such hijacks and what characterises this phenomenon? In particular,

1The experimental results described in this Chapter will be presented at the Network and Dis-tributed System Security (NDSS) Symposium in February 2015 [179].

2Preliminary results have been presented at RIPE [175] and NANOG [176] meetings.

we seek to determine how attackers abuse the inter-domain routing infrastructure to hijack blocks of IP addresses so as to identify the appropriate countermeasures to pre-vent them. We further aim at assessing the frequency and prevalence of such attacks to determine the magnitude of the security threat posed by such a phenomenon.

In Chapter3, we have introducedSpamTracer, a framework for collecting rout-ing data related to networks havrout-ing been seen originatrout-ing malicious network traffic and extracting networks exhibiting an abnormal routing behavior likely resulting from a BGP hijack. We now turn to the description of our results, by detailing step-by-step the outcome of every component of our experimental environmental (see Figure 6.1in Section 3.2of Chapter 3). We continue with a thorough investigation and validation of the candidate malicious BGP hijacks we have identified. Finally, we explore the usage of uncovered hijack characteristics to build a real-time blacklist of hijacked IP address blocks to support the detection and mitigation of malicious BGP hijack attacks.