Using Redis for data processing in a incident response environment.
Practical examples and design patterns.
Rapha¨ el Vinot
January 23, 2016
Devops & Incident Response
• Time constraints
• Similarity of the problems...
• ... but also very fast evolution
• High flexibility, modularity of the code
• Need to keep track of the evolution of the landscape
• Low maintenance
Devops & Incident Response - Solutions
• Be lazy.
• strings, cat, grep, cut, tr, grep, sort, uniq, parallel
• Simple and readable programming language
• Keeping track of the evolution
• One problem → one solution
• Chaining the tools
• Refactoring
• Do not overthink
Devops & Incident Response - Questions
• What am I looking for?
• What is the format of my source?
• Can I grep it?
• Did someone else already searched that? Can I reuse code?
• How much time do I have?
• How much data do I have to process? Protip: going to be much more.
• How often do I have to do it? Protip: as soon as it works, very
often
Devops & Incident Response - Writing code
• Do I have to write code?
• data → parsable feed → less data → output
• Am I done now?
◦ Do I process my data faster than my input?
◦ Am I comfortable processing all the input everytime I need it?
◦ Should I keep my output? How long?
Example
tshark -E header=yes -E separator=, -r file.pcap | \ cut -d, -f4 " | sort | uniq -c
• Many PCAPs, we want to compate them together
• Not really practicable to go through all of them
import.py (first iteration)
i m p o r t sys
for l i n e in sys . s t d i n : p r i n t ( l i n e . s p l i t (" , "))
tshark -E header=yes -E separator=, -Tfields \ -e http.server -r file.pcap | python import.py
• Still one single file.
• We don’t keep the file name (malware hash)
import.py (second iteration)
# N e e d to c a t c h the MD5 f i l e n a m e of the m a l w a r e ( f r o m the p c a p f i l e n a m e ) i m p o r t a r g p a r s e
i m p o r t sys
a r g P a r s e r = a r g p a r s e . A r g u m e n t P a r s e r ( d e s c r i p t i o n =’ P c a p c l a s s i f i e r ’)
a r g P a r s e r . a d d _ a r g u m e n t (’ - f ’, a c t i o n =’ a p p e n d ’, r e q u i r e d = True , h e l p=’ F i l e n a m e ’) a r g s = a r g P a r s e r . p a r s e _ a r g s ()
f i l e n a m e = a r g s . f for l i n e in sys . s t d i n :
p r i n t ( l i n e . s p l i t (" , ") [ 0 ] )
ls -1 ./pcap/*.pcap | parallel --gnu "cat {1} | \ tshark -E header=yes -E separator=, -Tfields \ -e http.server -r {1} | python import.py -f {1} "
• No history of the files we processed
import.py (third iteration)
# N e e d to k n o w and s t o r e the MD5 f i l e n a m e p r o c e s s e d i m p o r t a r g p a r s e
i m p o r t sys i m p o r t r e d i s
a r g P a r s e r = a r g p a r s e . A r g u m e n t P a r s e r ( d e s c r i p t i o n =’ P c a p c l a s s i f i e r ’)
a r g P a r s e r . a d d _ a r g u m e n t (’ - f ’, a c t i o n =’ a p p e n d ’, r e q u i r e d = True , h e l p=’ F i l e n a m e ’) a r g s = a r g P a r s e r . p a r s e _ a r g s ()
r = r e d i s . S t r i c t R e d i s ( h o s t =’ l o c a l h o s t ’, p o r t =6379 , db =0) md5 = a r g s . f . s p l i t (" . ") [ 0 ]
r . s a d d (’ p r o c e s s e d ’, md5 ) for l i n e in sys . s t d i n :
p r i n t ( l i n e . s p l i t (" , ") [ 0 ] )
• No flexibility on the field type
• Need to re-run the whole processing every time
import.py (fourth iteration)
i m p o r t a r g p a r s e i m p o r t sys i m p o r t r e d i s
a r g P a r s e r = a r g p a r s e . A r g u m e n t P a r s e r ( d e s c r i p t i o n =’ P c a p c l a s s i f i e r ’)
a r g P a r s e r . a d d _ a r g u m e n t (’ - f ’, a c t i o n =’ a p p e n d ’, r e q u i r e d = True , h e l p=’ F i l e n a m e ’) a r g s = a r g P a r s e r . p a r s e _ a r g s ()
r = r e d i s . S t r i c t R e d i s ( h o s t =’ l o c a l h o s t ’, p o r t =6379 , db =0) md5 = a r g s . f . s p l i t (" . ") [ 0 ]
if md5 not in r . s m e m b e r s (’ p r o c e s s e d ’):
r . s a d d (’ p r o c e s s e d ’, md5 ) f i r s t _ l i n e = T r u e f i e l d s = N o n e
for l i n e in sys . s t d i n : if f i r s t _ l i n e :
f i r s t _ l i n e = F a l s e
f i e l d s = l i n e . s t r i p (). s p l i t (" , ") e l s e:
for field , e l e m e n t in zip( fields , l i n e . r s t r i p (). s p l i t (" , ")):
try:
r . s a d d (’ e : ’+ field , e l e m e n t ) e x c e p t I n d e x E r r o r :
import.py (fourth iteration)
• No Track of fields stored
• High memory use
• Try / except should be avoided is possible
import.py (fourth iteration)
i m p o r t a r g p a r s e i m p o r t sys i m p o r t r e d i s i m p o r t h a s h l i b
a r g P a r s e r = a r g p a r s e . A r g u m e n t P a r s e r ( d e s c r i p t i o n =’ P c a p c l a s s i f i e r ’)
a r g P a r s e r . a d d _ a r g u m e n t (’ - f ’, a c t i o n =’ a p p e n d ’, r e q u i r e d = True , h e l p=’ F i l e n a m e ’) a r g s = a r g P a r s e r . p a r s e _ a r g s ()
r = r e d i s . S t r i c t R e d i s ( h o s t =’ l o c a l h o s t ’, p o r t =6379 , db =0) md5 = a r g s . f . s p l i t (" . ") [ 0 ]
if md5 not in r . s m e m b e r s (’ p r o c e s s e d ’):
r . s a d d (’ p r o c e s s e d ’, md5 ) f i r s t _ l i n e = T r u e for l i n e in sys . s t d i n :
if f i r s t _ l i n e : f i r s t _ l i n e = F a l s e
r . s a d d (’ t y p e ’, * l i n e . r s t r i p (). s p l i t (" , ")) e l s e:
for type, e l e m e n t in zip( r . s m e m b e r s (’ t y p e ’) , l i n e . s t r i p (). s p l i t (" , ")):
r . s a d d (’ e :{} ’.f o r m a t(t y p e) , e l e m e n t ) e h a s h = h a s h l i b . md5 ()
e h a s h . u p d a t e ( e l e m e n t . e n c o d e (’ utf -8 ’)) e h h e x = e h a s h . h e x d i g e s t ()
if e l e m e n t is not " ":
r . s a d d (’ v : ’+ ehhex , md5 )
graph.py - exporting graph from Redis
i m p o r t r e d i s
i m p o r t n e t w o r k x as nx i m p o r t h a s h l i b
r = r e d i s . S t r i c t R e d i s ( h o s t =’ l o c a l h o s t ’, p o r t =6379 , db =0) g = nx . G r a p h ()
for m a l w a r e in r . s m e m b e r s (’ p r o c e s s e d ’):
g . a d d _ n o d e ( m a l w a r e )
for f i e l d t y p e in r . s m e m b e r s (’ t y p e ’):
g . a d d _ n o d e ( f i e l d t y p e )
for v in r . s m e m b e r s (’ e : ’+ f i e l d t y p e . d e c o d e (’ utf -8 ’)):
g . a d d _ n o d e ( v ) e h a s h = h a s h l i b . md5 () e h a s h . u p d a t e ( v )
e h h e x = e h a s h . h e x d i g e s t () for m in r . s m e m b e r s (’ v : ’+ e h h e x ):
p r i n t ( m ) g . a d d _ e d g e ( v , m )