• Aucun résultat trouvé

3.2 SpamTracer

3.2.1 Routing data collection

TheSpamTracer approach is based upon the first observation of the “BGP spec-trum agility” phenomenon reported by Ramachandran et al. in [148] where spam-mers hijacked previously unannounced IP address blocks for a short period of time (i.e., less than a day) to launch spam campaigns. The assumption behind this ap-proach is thus that when an IP address block is hijacked for the purpose of performing other malicious activities from it for a short period of time then a routing change will be observed when the block is released by the hijacker to remain stealthy. Since we start monitoring a network when we observe malicious activities from it, we look for a routing change from the hijacked state of the network to the normal state of the network. The goal here is not to build a stand-alone BGP hijack detection system but instead to collect, in real time, routing data associated with malicious networks in order to identify cybercriminals operating from temporarily (i.e., less than one day) hijacked IP address blocks. In the remainder of this section we describe the different parts of our experimental environment in more details.

The data collection module of SpamTracer (¬ in Figure 6.1) is based on a linear data flow where a feed of IP addresses to monitor is given as input and a series of enriched traceroutes and BGP routes are produced as output from which routing anomalies can be uncovered. The feed mainly consists of IP addresses which were used to send spam in the last hour to Symantec.cloud spamtraps [35]. Because the spam feed consists of around 3,500,000 spam emails per day, a sampling is per-formed and around 40,000 IP addresses are tracerouted every day. Bogon prefixes (unallocated or reserved IP blocks) seen originating spam are automatically selected for monitoring as they represent unused IP space that spammers may have hijacked.

Building the AS-level routes enables to look at network routes from the same per-spective as BGP, which matters when studying IP prefix hijacking. The IP-to-AS mapping is performed using live BGP feeds from six RouteViews [43] servers which are distributed worldwide. The view of the routing in the Internet can differ from one location to another so geographic distribution of BGP as well as traceroute col-lectors is important. The BGP AS paths from the BGP colcol-lectors to the monitored

4The system configuration of the Amazon EC2 VPS (instance type m1.xlarge) was in-ferred from system configuration details provided by Amazon (http://aws.amazon.com/ec2/

previous-generation/and http://aws.amazon.com/ec2/instance-types/) and from real-world performance tests (http://www.pythian.com/blog/virtual-cpus-with-amazon-web-services/).

networks are also collected. Finally, further information is collected on the moni-tored networks and the different IP hops and ASes traversed (e.g.,geolocation [15], whois [37], allocation status [38]).

IP address block selection

The main input of theSpamTracerframework are lists of IP addresses that should be monitored. For each feed the duration of the monitoring period in number of days can be set depending on the feed profile, e.g.,networks likely hijacked in the future would be assigned long monitoring periods. The IP address feeds currently used in SpamTracer along with their contribution to the daily number of monitored networks are detailed in Table 3.3.

Feed Description Contribution (%)

symantec.cloud Hosts sending spam to Symantec.cloud spamtraps 80.0 shadowserver C&C servers (source: Shadowserver [32]) 3.0 spamhaus drop Networks allegedly hijacked by cybercriminals

(source: Spamhaus [41]) 8.0

dshield Malicious hosts (source: DShield [13]) 3.0

russian business network Hosts identified as belonging to the RBN cybercriminal organisation (source: emergingthreats.net [14]) 3.0

malware domain list Malicious hosts (source: Malware Domain List [127]) 3.0

Table 3.3 – Feeds of IP addresses of networks originating malicious network traffic used as input to SpamTracer. The contribution refers to the proportion of each feed in the ∼40,000 daily monitored networks.

Our primary dataset is a live feed of spam emails collected at spamtraps. Every day we receive about 3,500,000 spam emails from about 24,000 distinct IP address blocks. The spam feed is updated on an hourly basis. The other feeds are updated on a hourly or daily basis based on the frequency at which the feed providers update them,e.g.,Spamhaus publishes a new version of the DROP list every day. Due to the overhead imposed by traceroute measurements and by querying the BGP collectors, our system can currently monitor about 40,000 IP address blocks on a daily basis. A random sample of IP address blocks is extracted from each feed every hour. Prior to extracting new IP address blocks to monitor, we map individual IP addresses in each feed to their IP address blocks currently announced in BGP using archived routing information bases (RIB’s) from RouteViews and RIPE RIS. Because we monitor each block for 30 days,∼1,300 (= 40,00030 ) new IP address blocks are added to the system everyday. When selecting blocks to monitor we prioritize the recently announced ones as they are good candidates for short-lived hijacks as suggested in [148]. We consider to be recently announced any IP address block in our spam dataset that became routed within the last 24 hours, based on archived routing information bases (RIB’s) from RouteViews and RIPE RIS. It is noteworthy that only a handful of IP address blocks are identified as recently announced on daily basis, out of the∼1,300 newly monitored IP address blocks. This, thus, only marginally biases the random sampling.

3.2. SpamTracer 49 Traceroute and BGP monitoring

IP-level traceroute. A customized version of the classic traceroute function is used and is implemented in Python using the packet manipulation library Scapy5. For each destination host, 30 probe packets with incremented TTLs starting at 1 up to 30 are sent. Probe packets are sent in parallel to speed up the process. The base probe packet type is ICMP but when no reply is received for a given TTL, a second round is performed using UDP (port 33435) probe packets. For TTLs from which still no reply was received at the second round using UDP, TCP (port 80) probe packets are used for a third round. Using different types of probe packets aims at increasing the likelihood of at least one probe to reach the destination host.

Moreover, according to Bush et al. [64], ICMP probes are the most likely to reach their destination host, followed by UDP and TCP probes, in that order. The port numbers for UDP and TCP probe packets were chosen based on the study performed by Luckie et al. in [122] where the authors observed that packets destined to these ports were more likely to reach their destination host. For each round for a given TTL, three probe packets are sent before trying with the following probe type or giving up with the TTL.

Paths uncovered using traceroute may have holes (i.e., unresponsive hosts along the path) where no ICMP reply packet was received for some TTLs. Also, when no reply is received from a destination host, several IP addresses in the destination IP prefix are “pinged” to find a reachable host in the same network. Such technique allows to record a traceroute path that is more complete than the previous one and that still reaches the network IP prefix and AS of the original destination host.

AS-level traceroute (IP-to-AS mapping). Due to the many artefacts that can be found in IP-level routes uncovered using traceroute, studying anomalies in the Internet routing infrastructure using only such routes is a complicated task. Looking at the AS-level (i) enables to look at network routes from the same perspective as BGP which matters when studying IP prefix hijacking and (ii) hides some artefacts of IP-level routes by looking at the network from a higher-level view, e.g., load-balancing inside ASes.

The IP-to-AS mapping is performed using live BGP data queried from six Route-Views [43] route servers distributed worldwide (on five continents). Moreover, these six route servers aggregate the majority of all RouteViews peering ASes. The route servers’ BGP routes to the monitored network blocks are queried from their BGP route table (RIB) viatelnet. Because traceroute is a live measurement and to en-able the AS-level path to be as accurate as possible, it is important that each IP host is mapped to the AS announcing its IP prefix at that moment. Also the view of the routing in the Internet can differ from one location to another so geographic distribution of BGP collectors is important. Each IP-level hop is mapped to its IP prefix and the AS originating this prefix, as seen by the different BGP collectors.

The BGP AS path as well as other BGP-related information related to the monitored

5http://www.secdev.org/projects/scapy/

network is also collected.

IP/AS-level traceroute enrichment. Further registration information extracted from IRRs [20] and geolocation information obtained from Maxmind [15] is collected for the monitored network and the different IP- and AS-level hops traversed:

IP-level hops information: Information collected about the IP-level hops tra-versed by traceroute paths includes the domain name and the IP-level ge-olocation.

AS-level hops information: Information collected about the ASes traversed by traceroute paths includes the ASN, theAS-level geolocation, the AS allocation date and registry (RIR), and theAS owner.

Monitored IP address block information: Information collected about the mon-itored network includes theIP address block allocation dateandregistry (RIR), the IP address block owner, and the presence of the IP address block in the Team Cymru Bogon list (reserved or unallocated IP blocks)[38] and theSpamhaus DROP list [41].