• Aucun résultat trouvé

Classifying malware using network traffic analysis. Or how to learn Redis, git, tshark and Python in 4 hours. Alexandre Dulaunoy January 17, 2014

N/A
N/A
Protected

Academic year: 2022

Partager "Classifying malware using network traffic analysis. Or how to learn Redis, git, tshark and Python in 4 hours. Alexandre Dulaunoy January 17, 2014"

Copied!
32
0
0

Texte intégral

(1)

Classifying malware using network traffic analysis.

Or how to learn Redis, git, tshark and Python in 4 hours.

Alexandre Dulaunoy

January 17, 2014

(2)

Problem Statement

We have more 5000 pcap files generated per day for each malware execution in a sandbox

We need to classify1 the malware into various sets

The project needs to be done in less than a day and the code shared to another team via GitHub

(3)

File Format and Filename

...

0580c82f6f90b75fcf81fd3ac779ae84.pcap 05a0f4f7a72f04bda62e3a6c92970f6e.pcap 05b4a945e5f1f7675c19b74748fd30d1.pcap 05b57374486ce8a5ce33d3b7d6c9ba48.pcap 05bbddc8edac3615754f93139cf11674.pcap 05bf1ff78685b5de06b0417da01443a9.pcap 05c3bccc1abab5c698efa0dfec2fd3a4.pcap ...

<MD5 hash of the malware).pcap

MD5 values2 of malware samples are used by A/V vendors, security researchers and analyst.

(4)

MapReduce and Network Forensic

MapReduce is an old concept in computer science

Themap stage to perform isolated computation on independent problems

Thereducestage to combine the computation results

Network forensic computations can easily be expressed in map and reduce steps:

parsing, filtering, counting, sorting, aggregating, anonymizing, shuffling...

(5)

Processing and Reading pcap files

ls -1 | parallel --gnu ’tcpdump -s0 -A -n -r {1}’

Nice for processing the files but...

How do we combine the results?

How do we extract the classification parameters? (e.g. sed, awk, regexp?)

(6)

tshark

tshark -G fields

Wireshark is supporting a wide range of dissectors

tshark allows to use the dissectors from the command line tshark -E separator=, -Tfields -e ip.dst -r mycap.cap

(7)

Concurrent Network Forensic Processing

To allow concurrent processing, a non-blocking data store is required

To allow flexibility, a schema-free data store is required

To allow fast processing, you need to scale horizontally and to know the cost of querying the data store

To allow streaming processing, write/cost versus read/cost should be equivalent

(8)

Redis: a key-value/tuple store

Redis is key store written in C with an extended set of data types like lists, sets, ranked sets, hashes, queues

Redis is usually in memory with persistence achieved by regularly saving on disk

Redis API is simple (telnet-like) and supported by a multitude of programming languages

http://www.redis.io/

(9)

Redis: installation

Download Redis 2.8.3 (stable version)

tar xvfz redis-2.8.3.tar.gz

cd redis-2.8.3

make

(10)

Keys

Keys are free text values (up to 231 bytes) - newline not allowed

Short keys are usually better (to save memory)

Naming convention are used like keys separated by colon

(11)

Value and data types

binary-safe strings

lists of binary-safe strings

sets of binary-safe strings

hashes (dictionary-like)

pubsub channels

(12)

Running redis and talking to redis...

screen

cd ./src/ && ./redis-server

new screen session (crtl-a c)

redis-cli

DBSIZE

(13)

Commands available on all keys

Those commands are available on all keys regardless of their type

TYPE [key]→ gives you the type of key (from string to hash)

EXISTS [key] →does the key exist in the current database

RENAME [old new]

RENAMENX [old new]

DEL [key]

RANDOMKEY → returns a random key

TTL [key] → returns the number of sec before expiration

EXPIRE [key ttl] or EXPIRE [key ts]

KEYS [pattern] → returns all keys matching a pattern (!to use with care)

(14)

Commands available for strings type

SET [key] [value]

GET [key]

MGET [key1] [key2] [key3]

MSET [key1] [valueofkey1] ...

INCR [key] — INCRBY [key] [value] →! string interpreted as integer

DECR [key] — INCRBY [key] [value] → ! string interpreted as integer

(15)

Commands available for sets type

SADD [key] [member] →adds a member to a set named key

SMEMBERS [key] → return the member of a set

SREM [key] [member] →removes a member to a set named key

SCARD [key] → returns the cardinality of a set

SUNION [key ...] → returns the union of all the sets

SINTER [key ...] →returns the intersection of all the sets

SDIFF [key ...] → returns the difference of all the sets

S....STORE [destkey key ...] →same as before but stores the result

(16)

Commands available for list type

RPUSH - LPUSH [key] [value]

LLEN [key]

LRANGE [key] [start] [end]

LTRIM [key] [start] [end]

LSET [key] [index] [value]

LREM [key] [count] [value]

(17)

Sorting

SORT [key]

SORT [key] LIMIT 0 4

(18)

Commands availabled for sorted set type

ZADD [key] [score] [member]

ZCARD [key]

ZSCORE [key] [member]

ZRANK [key] [member]→ get the rank of a member from bottom

ZREVRANK [key] [member] → get the rank of a member from top

(19)

Atomic commands

GETSET [key] [newvalue] →sets newvalue and return previous value

(M)SETNX [key] [newvalue]→ sets newvalue except if key exists (useful for locking)

MSETNX is very useful to update a large set of objects without race condition.

(20)

Database commands

SELECT [0-15] → selects a database (default is 0)

MOVE [key] [db]→ move key to another database

FLUSHDB →delete all the keys in the current database

FLUSHALL→ delete all the keys in all the databases

SAVE - BGSAVE →save database on disk (directly or in background)

DBSIZE

MONITOR → what’s going on against your redis datastore (check

(21)

Redis from shell?

ret=$(redis-cli SADD dns:${md5} ${rdata}) num=$(redis-cli SCARD dns:${md5})

Why not Python?

import redis

r = redis.StrictRedis(host=’localhost’, port=6379, db=0) r.set(’foo’, ’bar’)

r.get(’foo’)

(22)

How do you integrate it?

ls -1 ./pcap/*.pcap | parallel --gnu "cat {1} |

tshark -E separator=, -Tfields -e http.server -r {1} | python import.py -f {1} "

Code need to be shared?

(23)

Structure and Sharing

As you start to work on a project and you are willing to share it

One approach is to start with a common directory structure for the software to be shared

/PcapClassifier/bin/ -> where to store the code /etc/ -> where to store configuration /data/

README.md

(24)

import.py (first iteration)

# N e e d to c a t c h the MD5 f i l e n a m e of the m a l w a r e ( f r o m the p c a p f i l e n a m e ) i m p o r t a r g p a r s e

i m p o r t sys

a r g P a r s e r = a r g p a r s e . A r g u m e n t P a r s e r ( d e s c r i p t i o n =’ P c a p c l a s s i f i e r ’) a r g P a r s e r . a d d _ a r g u m e n t (’ - f ’, a c t i o n =’ a p p e n d ’, h e l p =’ F i l e n a m e ’) a r g s = a r g P a r s e r . p a r s e _ a r g s ()

if a r g s . f is not N o n e : f i l e n a m e = a r g s . f for l i n e in sys . s t d i n :

p r i n t ( l i n e . s p l i t (" , ") [ 0 ] ) e l s e:

a r g P a r s e r . p r i n t _ h e l p ()

(25)

Storing the code and its revision

Initialize a git repository (one time)

cd P c a p C l a s s i f i e r git i n i t

Add a file to the index (one time)

git add ./ bin /i m p o r t. py

Commit a file to the index (when you do changes)

git c o m m i t ./ bin /i m p o r t. py

(26)

import.py (second iteration)

# N e e d to k n o w and s t o r e the MD5 f i l e n a m e p r o c e s s e d i m p o r t a r g p a r s e

i m p o r t sys i m p o r t r e d i s

a r g P a r s e r = a r g p a r s e . A r g u m e n t P a r s e r ( d e s c r i p t i o n =’ P c a p c l a s s i f i e r ’) a r g P a r s e r . a d d _ a r g u m e n t (’ - f ’, a c t i o n =’ a p p e n d ’, h e l p =’ F i l e n a m e ’) a r g s = a r g P a r s e r . p a r s e _ a r g s ()

r = r e d i s . S t r i c t R e d i s ( h o s t =’ l o c a l h o s t ’, p o r t =6379 , db =0)

if a r g s . f is not N o n e :

md5 = a r g s . f [ 0 ] . s p l i t (" . ") [ 0 ] r . s a d d (’ p r o c e s s e d ’, md5 ) for l i n e in sys . s t d i n :

p r i n t ( l i n e . s p l i t (" , ") [ 0 ] )

(27)

Redis datastructure

Now we have a set of the file processed (including the MD5 hash of the malware):

{processed}={abcdef...,deadbeef...,0101aba....}

A type set to store the field type decoded from the pcap (e.g.

http.user agent)

{type}={”http.user agent”,”http.server”}

A unique set to store all the values seen per type {e :http.user agent}={”Mozilla...”,”Curl...”}

(28)

import.py (third iteration)

i m p o r ta r g p a r s e i m p o r tsys i m p o r tr e d i s

a r g P a r s e r = a r g p a r s e . A r g u m e n t P a r s e r ( d e s c r i p t i o n =’ P c a p c l a s s i f i e r ’) a r g P a r s e r . a d d _ a r g u m e n t (’ - f ’, a c t i o n =’ a p p e n d ’, h e l p =’ F i l e n a m e ’) a r g s = a r g P a r s e r . p a r s e _ a r g s ()

r = r e d i s . S t r i c t R e d i s ( h o s t =’ l o c a l h o s t ’, p o r t =6379 , db =0)

ifa r g s . fis notN o n e : md5 = a r g s . f [ 0 ] . s p l i t (" . ") [ 0 ] r . s a d d (’ p r o c e s s e d ’, md5 ) l n u m b e r =0

f i e l d s = N o n e forl i n einsys . s t d i n :

ifl n u m b e r == 0:

f i e l d s = l i n e . r s t r i p (). s p l i t (" , ") forf i e l dinf i e l d s :

r . s a d d (’ t y p e ’, f i e l d ) e l s e:

e l e m e n t s = l i n e . r s t r i p (). s p l i t (" , ") i =0

fore l e m e n tine l e m e n t s : try:

(29)

How to link a type to a MD5 malware hash?

Now, we have set of value per type

{e :http.user agent}={”Mozilla...”,”Curl...”}

Using MD5 to hash type value and create a set of type value for each malware

{v :MD5hex(”Mozilla...”)}={”MD5 of malware”,”....”}

(30)

import.py (fourth iteration)

i m p o r ta r g p a r s e i m p o r tsys i m p o r tr e d i s i m p o r th a s h l i b

a r g P a r s e r = a r g p a r s e . A r g u m e n t P a r s e r ( d e s c r i p t i o n =’ P c a p c l a s s i f i e r ’) a r g P a r s e r . a d d _ a r g u m e n t (’ - f ’, a c t i o n =’ a p p e n d ’, h e l p =’ F i l e n a m e ’) a r g s = a r g P a r s e r . p a r s e _ a r g s ()

r = r e d i s . S t r i c t R e d i s ( h o s t =’ l o c a l h o s t ’, p o r t =6379 , db =0) ifa r g s . fis notN o n e :

md5 = a r g s . f [ 0 ] . s p l i t (" . ") [ 0 ] r . s a d d (’ p r o c e s s e d ’, md5 ) l n u m b e r =0

f i e l d s = N o n e forl i n einsys . s t d i n :

ifl n u m b e r == 0:

# p r i n t ( l i n e . s p l i t ( " , " ) [ 0 ] ) f i e l d s = l i n e . r s t r i p (). s p l i t (" , ") forf i e l dinf i e l d s :

r . s a d d (’ t y p e ’, f i e l d ) e l s e:

e l e m e n t s = l i n e . r s t r i p (). s p l i t (" , ") i =0

fore l e m e n tine l e m e n t s : try:

r . s a d d (’ e : ’+ f i e l d s [ i ] , e l e m e n t ) e h a s h = h a s h l i b . md5 ()

e h a s h . u p d a t e ( e l e m e n t . e n c o d e (’ utf -8 ’))

(31)

graph.py - exporting graph from Redis

i m p o r t r e d i s

i m p o r t n e t w o r k x as nx i m p o r t h a s h l i b

r = r e d i s . S t r i c t R e d i s ( h o s t =’ l o c a l h o s t ’, p o r t =6379 , db =0) g = nx . G r a p h ()

for m a l w a r e in r . s m e m b e r s (’ p r o c e s s e d ’):

g . a d d _ n o d e ( m a l w a r e )

for f i e l d t y p e in r . s m e m b e r s (’ t y p e ’):

g . a d d _ n o d e ( f i e l d t y p e )

for v in r . s m e m b e r s (’ e : ’+ f i e l d t y p e . d e c o d e (’ utf -8 ’)):

g . a d d _ n o d e ( v )

e h a s h = h a s h l i b . md5 () e h a s h . u p d a t e ( v )

e h h e x = e h a s h . h e x d i g e s t () for m in r . s m e m b e r s (’ v : ’+ e h h e x ):

p r i n t ( m ) g . a d d _ e d g e ( v , m )

(32)

graph.py - exporting graph from Redis

Références

Documents relatifs

These systems 3 , although utilize mature and efficient relational database management systems (RDBMSs) and exploit a number of their evaluation strategies (e.g., query

Thus one could apply the intersec- tions with several different minimum support thresh- olds to get refining collection of frequent closed sets in the data: the already found

Other approaches possible to capture data (Bridge interception, dup-to of a packet filtering, ...).. A side note regarding wireless network, promiscuous mode is only capturing

• Redis is key store written in C with an extended set of data types like lists, sets, ranked sets, hashes, queues. • Redis is usually in memory with persistence achieved by

• I hate large piece of code, the implementation must be modular and having less than 5K LOC (↓ trash and rewrite cost). • I hate those bloody legal documents, we keep only the

• Redis is key store written in C with an extended set of data types like lists, sets, ranked sets, hashes, queues. • Redis is usually in memory with persistence achieved by

The objective is that you keep a focus on a specific aspect of computer forensic (network, system, malware analysis, data mining) to be used for your project.. Or how to learn

Using Redis for data processing in a incident response environment.. Practical examples and