Using Redis for data processing in a incident response environment. Practical examples and design patterns. Rapha¨el Vinot January 23, 2016

(1)

Using Redis for data processing in a incident response environment.

Practical examples and design patterns.

Rapha¨ el Vinot

January 23, 2016

(2)

Devops & Incident Response

• Time constraints

• Similarity of the problems...

• ... but also very fast evolution

• High flexibility, modularity of the code

• Need to keep track of the evolution of the landscape

• Low maintenance

(3)

Devops & Incident Response - Solutions

• Be lazy.

• strings, cat, grep, cut, tr, grep, sort, uniq, parallel

• Simple and readable programming language

• Keeping track of the evolution

• One problem → one solution

• Chaining the tools

• Refactoring

• Do not overthink

(4)

Devops & Incident Response - Questions

• What am I looking for?

• What is the format of my source?

• Can I grep it?

• Did someone else already searched that? Can I reuse code?

• How much time do I have?

• How much data do I have to process? Protip: going to be much more.

• How often do I have to do it? Protip: as soon as it works, very

often

(5)

Devops & Incident Response - Writing code

• Do I have to write code?

• data → parsable feed → less data → output

• Am I done now?

◦ Do I process my data faster than my input?

◦ Am I comfortable processing all the input everytime I need it?

◦ Should I keep my output? How long?

(6)

Example

tshark -E header=yes -E separator=, -r file.pcap | \ cut -d, -f4 " | sort | uniq -c

• Many PCAPs, we want to compate them together

• Not really practicable to go through all of them

(7)

import.py (first iteration)

i m p o r t sys

for l i n e in sys . s t d i n : p r i n t ( l i n e . s p l i t (" , "))

tshark -E header=yes -E separator=, -Tfields \ -e http.server -r file.pcap | python import.py

• Still one single file.

• We don’t keep the file name (malware hash)

(8)

import.py (second iteration)

# N e e d to c a t c h the MD5 f i l e n a m e of the m a l w a r e ( f r o m the p c a p f i l e n a m e ) i m p o r t a r g p a r s e

i m p o r t sys

a r g P a r s e r = a r g p a r s e . A r g u m e n t P a r s e r ( d e s c r i p t i o n =’ P c a p c l a s s i f i e r ’)

a r g P a r s e r . a d d _ a r g u m e n t (’ - f ’, a c t i o n =’ a p p e n d ’, r e q u i r e d = True , h e l p=’ F i l e n a m e ’) a r g s = a r g P a r s e r . p a r s e _ a r g s ()

f i l e n a m e = a r g s . f for l i n e in sys . s t d i n :

p r i n t ( l i n e . s p l i t (" , ") [ 0 ] )

ls -1 ./pcap/*.pcap | parallel --gnu "cat {1} | \ tshark -E header=yes -E separator=, -Tfields \ -e http.server -r {1} | python import.py -f {1} "

• No history of the files we processed

(9)

import.py (third iteration)

# N e e d to k n o w and s t o r e the MD5 f i l e n a m e p r o c e s s e d i m p o r t a r g p a r s e

i m p o r t sys i m p o r t r e d i s

r = r e d i s . S t r i c t R e d i s ( h o s t =’ l o c a l h o s t ’, p o r t =6379 , db =0) md5 = a r g s . f . s p l i t (" . ") [ 0 ]

r . s a d d (’ p r o c e s s e d ’, md5 ) for l i n e in sys . s t d i n :

p r i n t ( l i n e . s p l i t (" , ") [ 0 ] )

• No flexibility on the field type

• Need to re-run the whole processing every time

(10)

import.py (fourth iteration)

i m p o r t a r g p a r s e i m p o r t sys i m p o r t r e d i s

if md5 not in r . s m e m b e r s (’ p r o c e s s e d ’):

r . s a d d (’ p r o c e s s e d ’, md5 ) f i r s t _ l i n e = T r u e f i e l d s = N o n e

for l i n e in sys . s t d i n : if f i r s t _ l i n e :

f i r s t _ l i n e = F a l s e

f i e l d s = l i n e . s t r i p (). s p l i t (" , ") e l s e:

for field , e l e m e n t in zip( fields , l i n e . r s t r i p (). s p l i t (" , ")):

try:

r . s a d d (’ e : ’+ field , e l e m e n t ) e x c e p t I n d e x E r r o r :

(11)

import.py (fourth iteration)

• No Track of fields stored

• High memory use

• Try / except should be avoided is possible

(12)

import.py (fourth iteration)

i m p o r t a r g p a r s e i m p o r t sys i m p o r t r e d i s i m p o r t h a s h l i b

if md5 not in r . s m e m b e r s (’ p r o c e s s e d ’):

r . s a d d (’ p r o c e s s e d ’, md5 ) f i r s t _ l i n e = T r u e for l i n e in sys . s t d i n :

if f i r s t _ l i n e : f i r s t _ l i n e = F a l s e

r . s a d d (’ t y p e ’, * l i n e . r s t r i p (). s p l i t (" , ")) e l s e:

for type, e l e m e n t in zip( r . s m e m b e r s (’ t y p e ’) , l i n e . s t r i p (). s p l i t (" , ")):

r . s a d d (’ e :{} ’.f o r m a t(t y p e) , e l e m e n t ) e h a s h = h a s h l i b . md5 ()

e h a s h . u p d a t e ( e l e m e n t . e n c o d e (’ utf -8 ’)) e h h e x = e h a s h . h e x d i g e s t ()

if e l e m e n t is not " ":

r . s a d d (’ v : ’+ ehhex , md5 )

(13)

graph.py - exporting graph from Redis

i m p o r t r e d i s

i m p o r t n e t w o r k x as nx i m p o r t h a s h l i b

r = r e d i s . S t r i c t R e d i s ( h o s t =’ l o c a l h o s t ’, p o r t =6379 , db =0) g = nx . G r a p h ()

for m a l w a r e in r . s m e m b e r s (’ p r o c e s s e d ’):

g . a d d _ n o d e ( m a l w a r e )

for f i e l d t y p e in r . s m e m b e r s (’ t y p e ’):

g . a d d _ n o d e ( f i e l d t y p e )

for v in r . s m e m b e r s (’ e : ’+ f i e l d t y p e . d e c o d e (’ utf -8 ’)):

g . a d d _ n o d e ( v ) e h a s h = h a s h l i b . md5 () e h a s h . u p d a t e ( v )

e h h e x = e h a s h . h e x d i g e s t () for m in r . s m e m b e r s (’ v : ’+ e h h e x ):

Using Redis for data processing in a incident response environment. Practical examples and design patterns. Rapha¨el Vinot January 23, 2016