• Aucun résultat trouvé

Proactive Discovery of Phishing Related Domain Names

N/A
N/A
Protected

Academic year: 2021

Partager "Proactive Discovery of Phishing Related Domain Names"

Copied!
31
0
0

Texte intégral

(1)

samuel.marchal@uni.lu 13/09/12

Proactive Discovery of Phishing

Related Domain Names

(2)

Motivation Phishing domains modelling Experiments and Results Conclusion

Outline

1

Motivation

(3)

Motivation Phishing domains modelling Experiments and Results Conclusion

Outline

1

Motivation

2

Phishing domains modelling

3

Experiments and Results

(4)

Motivation Phishing domains modelling Experiments and Results Conclusion

(5)

Motivation Phishing domains modelling Experiments and Results Conclusion

(6)

Motivation Phishing domains modelling Experiments and Results Conclusion

Early blacklisting solution

DNS Proactive Blacklisting · ebay-securelogin.com · paypalprotect.com · hsbcbankinglogon.com · ... · fake websites · download malware · ... Spoofed e-mail Register Control Visit Checking + ALERT Domain names generator

State of the Art:

(7)

Motivation Phishing domains modelling Experiments and Results Conclusion

Early blacklisting solution

DNS Proactive Blacklisting · ebay-securelogin.com · paypalprotect.com · hsbcbankinglogon.com · ... · fake websites · download malware · ... Spoofed e-mail Register Control Visit Checking + ALERT Domain names generator

State of the Art:

(8)

Motivation Phishing domains modelling Experiments and Results Conclusion

Early blacklisting solution

DNS Proactive Blacklisting · ebay-securelogin.com · paypalprotect.com · hsbcbankinglogon.com · ... · fake websites · download malware · ... Spoofed e-mail Register Control Visit Checking + ALERT Domain names generator

State of the Art:

(9)

Motivation Phishing domains modelling Experiments and Results Conclusion

Outline

1

Motivation

2

Phishing domains modelling

3

Experiments and Results

(10)

Motivation Phishing domains modelling Experiments and Results Conclusion

Natural language model

Key idea: generate domain names similar to those registered by phishers

=⇒ relying on natural language

I focus on main domain + TLD

ex: www.login.myphishingdomain.com/index.php

I extract features from blacklisted phishing domain

names

I deduce a generation model for domain names

(11)

Motivation Phishing domains modelling Experiments and Results Conclusion

Natural language model

Key idea: generate domain names similar to those registered by phishers =⇒ relying on natural language

I focus on main domain + TLD

ex: www.login.myphishingdomain.com/index.php

I extract features from blacklisted phishing domain

names

I deduce a generation model for domain names

(12)

Motivation Phishing domains modelling Experiments and Results Conclusion

Natural language model

Key idea: generate domain names similar to those registered by phishers =⇒ relying on natural language

I focus on main domain + TLD

ex: www.login.myphishingdomain.com/index.php

I extract features from blacklisted phishing domain

names

I deduce a generation model for domain names

(13)

Motivation Phishing domains modelling Experiments and Results Conclusion

Natural language model

Key idea: generate domain names similar to those registered by phishers =⇒ relying on natural language

I focus on main domain + TLD

ex: www.login.myphishingdomain.com/index.php

I extract features from blacklisted phishing domain

names

I deduce a generation model for domain names

(14)

Motivation Phishing domains modelling Experiments and Results Conclusion

Features extraction

securelogin34ebaymy-securephishing-domain.co.uk securelogin34ebaymy-securephishing-domain securephishing domain login

secure ebay my secure phishing domain

TLD splitting "-" splitting number extraction word segmentation 34 securelogin34ebaymy TLD I distlen = {(8, 1)}

I distword = {(secure, 0.25), (login, 0.125), (34, 0.125), (ebay , 0.125), ...}

I distfirstword = {(secure, 1)}

(15)

Motivation Phishing domains modelling Experiments and Results Conclusion

Features extraction

securelogin34ebaymy-securephishing-domain.co.uk securelogin34ebaymy-securephishing-domain securephishing domain login

secure ebay my secure phishing domain

TLD splitting "-" splitting number extraction word segmentation 34 securelogin34ebaymy TLD I distlen = {(8, 1)}

(16)

Motivation Phishing domains modelling Experiments and Results Conclusion

Model generation

secure login 34 ebay my phishing domain 0.5 1 1 1 1 0.5 1 I distlen = {(8, 1)}

I distword = {(secure, 0.25), (login, 0.125), (34, 0.125), (ebay , 0.125), ...}

I distfirstword = {(secure, 1)}

(17)

Motivation Phishing domains modelling Experiments and Results Conclusion

Model generation

secure login 34 ebay my phishing domain 0.5 1 1 1 1 0.5 1 1 I I distlen = {(8, 1)}

I distword = {(secure, 0.25), (login, 0.125), (34, 0.125), (ebay , 0.125), ...}

(18)

Motivation Phishing domains modelling Experiments and Results Conclusion

Model generation

secure login 34 ebay my phishing domain 0.475 0.95 0.95 0.95 0.95 0.475 0.95 0.008 0.008 0.016 0.008 0.008 1 I I distlen = {(8, 1)}

I distword = {(secure, 0.25), (login, 0.125), (34, 0.125), (ebay , 0.125), ...}

I distfirstword = {(secure, 1)}

(19)

Motivation Phishing domains modelling Experiments and Results Conclusion

Semantic extension

Disco:

I calculate a similarity score (semantic relatedness)

between 2 words

I give the n most related words to w

I based on dictionary (Wikipedia, BNC, PubMed, etc.)

I applied to each state of

the Markov Chain ⇒ expand the discovery

(20)

Motivation Phishing domains modelling Experiments and Results Conclusion

Generator global overview

I extract features from known phishing domains I generate domain names ⇒ potentially phishing I domain names automatically checked further

=⇒ Blacklist Name Statistics set up xp Markov Chains + (1) (4) (5) Name Decompositon TLD list: com, lu, fr, de, org...

Malicous domains (blacklists, honeypots, malware analysis...) Word Splitter DISCO Domain checker Potential Malicious Domain List macromediasetup.com/dl.exe

macromediasetup,com |macro|media|set|up|, |com|

Feature extraction Model

(2) (3)

Blacklist

(21)

Motivation Phishing domains modelling Experiments and Results Conclusion

Outline

1

Motivation

2

Phishing domains modelling

3

Experiments and Results

(22)

Motivation Phishing domains modelling Experiments and Results Conclusion

Offline testing

Phishing domain set from blacklists (∼ 50,000):

I Malware Domain List (01/2009 → 03/2012)

I DNS-Black-Hole (01/2009 → 03/2012)

I PhishTank (07/2007 → 03/2012)

5 tests of 1 million domain generations I learning set 30% (15,000 domains) I testing set 70% (35,000 domains) 0 100 200 300 400 500 600 0 200000 400000 600000 800000 1e+06 # of ma lic ious doma in nam es

# of generated domain names

(23)

Motivation Phishing domains modelling Experiments and Results Conclusion

Offline testing

Phishing domain set from blacklists (∼ 50,000):

I Malware Domain List (01/2009 → 03/2012)

I DNS-Black-Hole (01/2009 → 03/2012)

I PhishTank (07/2007 → 03/2012)

(24)

Motivation Phishing domains modelling Experiments and Results Conclusion

Offline testing

Predictability I learning: the 10% oldest I testing: 90% remaining 0 5 10 15 20 25 m+2m+4m+6m+8m+10m+12m+14m+16m+18m+20m+22m+24m+26m+28m+30m+32m+34 # of ma lic ious doma in nam es

Time (in month) after the generation

Strategy I learning set 30% I testing set 70% 0 50 100 150 200 250 300 350 400 0 200000 400000 600000 800000 1e+06 # of ma lic ious doma in nam es

# of generated domain names

(25)
(26)

Motivation Phishing domains modelling Experiments and Results Conclusion

Online testing

DNS request for 1 million domains generated ∼ 100,000 domains match an @IP:

I ∼ 80,000 wildcardings domains I ∼ 5,000 domains for sale

I ∼ 15,000 remaining domains:

I ∼ 500 actually malicious and blacklisted I ∼ 200 legitimate domains

Discriminate phishing from legitimate generated domains:

MCscore

(27)

Motivation Phishing domains modelling Experiments and Results Conclusion

Online testing

DNS request for 1 million domains generated ∼ 100,000 domains match an @IP:

I ∼ 80,000 wildcardings domains I ∼ 5,000 domains for sale

I ∼ 15,000 remaining domains:

I ∼ 500 actually malicious and blacklisted I ∼ 200 legitimate domains

Discriminate phishing from legitimate generated domains:

MCscore

=⇒ Eliminate 93 % of legitimate domains...

(28)

Motivation Phishing domains modelling Experiments and Results Conclusion

Online testing

DNS request for 1 million domains generated ∼ 100,000 domains match an @IP:

I ∼ 80,000 wildcardings domains I ∼ 5,000 domains for sale

I ∼ 15,000 remaining domains:

I ∼ 500 actually malicious and blacklisted I ∼ 200 legitimate domains

Discriminate phishing from legitimate generated domains:

MCscore

(29)

Motivation Phishing domains modelling Experiments and Results Conclusion

Outline

1

Motivation

2

Phishing domains modelling

3

Experiments and Results

(30)

Motivation Phishing domains modelling Experiments and Results Conclusion

Conclusion

Generation of domain names likely to be malicious

I features extracted from existing domain names

I Markov chain model

I semantic relatedness techniques

=⇒ Proactively build a phishing blacklist Results:

I able to generate phishing domains... I ... still with false positives

=⇒ Domain scoring based on Markov chain model Future works:

(31)

samuel.marchal@uni.lu 13/09/12

Proactive Discovery of Phishing

Related Domain Names

Références

Documents relatifs

The second global survey (2010–2012) brought together evidence on trends in eHealth policies and strategies, mobile health, telemedicine, eLearning, management of patient

With the above procedure we obtained two data sets 8 : The secret evaluation data set containing for each test user the two left out (i. e., ENTER SEARCH) names and the public

We discuss relationships of attribute implications to various tools in com- puter science and artificial intelligence: functional dependencies, horn theories, emer- gent

g) Citer quatre parades pour limiter (pas éviter) le phishing, en complétant le tableau suivant :?.

La technique consiste à faire croire à la victime qu'elle s'adresse à un tiers de confiance (banque, administration, etc.) afin de lui soutirer des renseignements personnels : mot

• Vous n'aidez pas les escrocs : vous ne cliquez pas sur un lien, vous ne donnez pas de codes et vous ne téléchargez pas de fichiers. • Vous transmettez ce message

We also could not find any relations between the measured behaviours, age, gender, and education level of the users and their phishability level in the first and second steps

Furthermore, specific situations, such as the pandemic, is also considered in the solution- that is, when a situation like the COVID-19 pandemic happens, the solution will