• Aucun résultat trouvé

Using Redis for data processing in a incident response environment. Practical examples and design patterns. Rapha¨el Vinot January 23, 2016

N/A
N/A
Protected

Academic year: 2022

Partager "Using Redis for data processing in a incident response environment. Practical examples and design patterns. Rapha¨el Vinot January 23, 2016"

Copied!
16
0
0

Texte intégral

(1)

Using Redis for data processing in a incident response environment.

Practical examples and design patterns.

Rapha¨ el Vinot

January 23, 2016

(2)

Devops & Incident Response

• Time constraints

• Similarity of the problems...

• ... but also very fast evolution

• High flexibility, modularity of the code

• Need to keep track of the evolution of the landscape

• Low maintenance

(3)

Devops & Incident Response - Solutions

• Be lazy.

• strings, cat, grep, cut, tr, grep, sort, uniq, parallel

• Simple and readable programming language

• Keeping track of the evolution

• One problem → one solution

• Chaining the tools

• Refactoring

• Do not overthink

(4)

Devops & Incident Response - Questions

• What am I looking for?

• What is the format of my source?

• Can I grep it?

• Did someone else already searched that? Can I reuse code?

• How much time do I have?

• How much data do I have to process? Protip: going to be much more.

• How often do I have to do it? Protip: as soon as it works, very

often

(5)

Devops & Incident Response - Writing code

• Do I have to write code?

• data → parsable feed → less data → output

• Am I done now?

◦ Do I process my data faster than my input?

◦ Am I comfortable processing all the input everytime I need it?

◦ Should I keep my output? How long?

(6)

Example

tshark -E header=yes -E separator=, -r file.pcap | \ cut -d, -f4 " | sort | uniq -c

• Many PCAPs, we want to compate them together

• Not really practicable to go through all of them

(7)

import.py (first iteration)

i m p o r t sys

for l i n e in sys . s t d i n : p r i n t ( l i n e . s p l i t (" , "))

tshark -E header=yes -E separator=, -Tfields \ -e http.server -r file.pcap | python import.py

• Still one single file.

• We don’t keep the file name (malware hash)

(8)

import.py (second iteration)

# N e e d to c a t c h the MD5 f i l e n a m e of the m a l w a r e ( f r o m the p c a p f i l e n a m e ) i m p o r t a r g p a r s e

i m p o r t sys

a r g P a r s e r = a r g p a r s e . A r g u m e n t P a r s e r ( d e s c r i p t i o n =’ P c a p c l a s s i f i e r ’)

a r g P a r s e r . a d d _ a r g u m e n t (’ - f ’, a c t i o n =’ a p p e n d ’, r e q u i r e d = True , h e l p=’ F i l e n a m e ’) a r g s = a r g P a r s e r . p a r s e _ a r g s ()

f i l e n a m e = a r g s . f for l i n e in sys . s t d i n :

p r i n t ( l i n e . s p l i t (" , ") [ 0 ] )

ls -1 ./pcap/*.pcap | parallel --gnu "cat {1} | \ tshark -E header=yes -E separator=, -Tfields \ -e http.server -r {1} | python import.py -f {1} "

• No history of the files we processed

(9)

import.py (third iteration)

# N e e d to k n o w and s t o r e the MD5 f i l e n a m e p r o c e s s e d i m p o r t a r g p a r s e

i m p o r t sys i m p o r t r e d i s

a r g P a r s e r = a r g p a r s e . A r g u m e n t P a r s e r ( d e s c r i p t i o n =’ P c a p c l a s s i f i e r ’)

a r g P a r s e r . a d d _ a r g u m e n t (’ - f ’, a c t i o n =’ a p p e n d ’, r e q u i r e d = True , h e l p=’ F i l e n a m e ’) a r g s = a r g P a r s e r . p a r s e _ a r g s ()

r = r e d i s . S t r i c t R e d i s ( h o s t =’ l o c a l h o s t ’, p o r t =6379 , db =0) md5 = a r g s . f . s p l i t (" . ") [ 0 ]

r . s a d d (’ p r o c e s s e d ’, md5 ) for l i n e in sys . s t d i n :

p r i n t ( l i n e . s p l i t (" , ") [ 0 ] )

• No flexibility on the field type

• Need to re-run the whole processing every time

(10)

import.py (fourth iteration)

i m p o r t a r g p a r s e i m p o r t sys i m p o r t r e d i s

a r g P a r s e r = a r g p a r s e . A r g u m e n t P a r s e r ( d e s c r i p t i o n =’ P c a p c l a s s i f i e r ’)

a r g P a r s e r . a d d _ a r g u m e n t (’ - f ’, a c t i o n =’ a p p e n d ’, r e q u i r e d = True , h e l p=’ F i l e n a m e ’) a r g s = a r g P a r s e r . p a r s e _ a r g s ()

r = r e d i s . S t r i c t R e d i s ( h o s t =’ l o c a l h o s t ’, p o r t =6379 , db =0) md5 = a r g s . f . s p l i t (" . ") [ 0 ]

if md5 not in r . s m e m b e r s (’ p r o c e s s e d ’):

r . s a d d (’ p r o c e s s e d ’, md5 ) f i r s t _ l i n e = T r u e f i e l d s = N o n e

for l i n e in sys . s t d i n : if f i r s t _ l i n e :

f i r s t _ l i n e = F a l s e

f i e l d s = l i n e . s t r i p (). s p l i t (" , ") e l s e:

for field , e l e m e n t in zip( fields , l i n e . r s t r i p (). s p l i t (" , ")):

try:

r . s a d d (’ e : ’+ field , e l e m e n t ) e x c e p t I n d e x E r r o r :

(11)

import.py (fourth iteration)

• No Track of fields stored

• High memory use

• Try / except should be avoided is possible

(12)

import.py (fourth iteration)

i m p o r t a r g p a r s e i m p o r t sys i m p o r t r e d i s i m p o r t h a s h l i b

a r g P a r s e r = a r g p a r s e . A r g u m e n t P a r s e r ( d e s c r i p t i o n =’ P c a p c l a s s i f i e r ’)

a r g P a r s e r . a d d _ a r g u m e n t (’ - f ’, a c t i o n =’ a p p e n d ’, r e q u i r e d = True , h e l p=’ F i l e n a m e ’) a r g s = a r g P a r s e r . p a r s e _ a r g s ()

r = r e d i s . S t r i c t R e d i s ( h o s t =’ l o c a l h o s t ’, p o r t =6379 , db =0) md5 = a r g s . f . s p l i t (" . ") [ 0 ]

if md5 not in r . s m e m b e r s (’ p r o c e s s e d ’):

r . s a d d (’ p r o c e s s e d ’, md5 ) f i r s t _ l i n e = T r u e for l i n e in sys . s t d i n :

if f i r s t _ l i n e : f i r s t _ l i n e = F a l s e

r . s a d d (’ t y p e ’, * l i n e . r s t r i p (). s p l i t (" , ")) e l s e:

for type, e l e m e n t in zip( r . s m e m b e r s (’ t y p e ’) , l i n e . s t r i p (). s p l i t (" , ")):

r . s a d d (’ e :{} ’.f o r m a t(t y p e) , e l e m e n t ) e h a s h = h a s h l i b . md5 ()

e h a s h . u p d a t e ( e l e m e n t . e n c o d e (’ utf -8 ’)) e h h e x = e h a s h . h e x d i g e s t ()

if e l e m e n t is not " ":

r . s a d d (’ v : ’+ ehhex , md5 )

(13)

graph.py - exporting graph from Redis

i m p o r t r e d i s

i m p o r t n e t w o r k x as nx i m p o r t h a s h l i b

r = r e d i s . S t r i c t R e d i s ( h o s t =’ l o c a l h o s t ’, p o r t =6379 , db =0) g = nx . G r a p h ()

for m a l w a r e in r . s m e m b e r s (’ p r o c e s s e d ’):

g . a d d _ n o d e ( m a l w a r e )

for f i e l d t y p e in r . s m e m b e r s (’ t y p e ’):

g . a d d _ n o d e ( f i e l d t y p e )

for v in r . s m e m b e r s (’ e : ’+ f i e l d t y p e . d e c o d e (’ utf -8 ’)):

g . a d d _ n o d e ( v ) e h a s h = h a s h l i b . md5 () e h a s h . u p d a t e ( v )

e h h e x = e h a s h . h e x d i g e s t () for m in r . s m e m b e r s (’ v : ’+ e h h e x ):

p r i n t ( m ) g . a d d _ e d g e ( v , m )

(14)

graph.py - exporting graph from Redis

(15)

Design patterns

• Shared memory (this code, BGP Ranking, ssdc, url-abuse)

• Pipes (AIL Framework, MultiProcQueue)

• Cache (BGP Ranking, ssdc, uwhoisd)

• Long term storage & fast access (BGP Ranking)

• I don’t know what I’m doing (All of them, at the beginning)

(16)

Design patterns - Sourcecodes

• https://github.com/CIRCL/bgp-ranking

• https://github.com/CIRCL/AIL-framework

• https://github.com/CIRCL/ssdc/tree/master/multiproc

• https://github.com/Rafiot/bgpranking-hilbert

• https://github.com/Rafiot/MultiProcQueue

• https://github.com/Rafiot/uwhoisd

• ...

Références

Documents relatifs

We present a framework based on the standard InkML format to represent digital ink in a collaborative environment using pen-based interaction functionalities.. This framework

Now, in order to study the asymptotic effeciency for our procedure, we give the following oracle inequality for the robust risk defined in (1.3) and through a specific

A quantum hybrid Helmholtz machine use quantum sampling to improve run time, a quantum Hopfield neural network shows an im- proved capacity and a variational quantum

Based on the requirements we developed for integration into the computational and analytical environment, a number of computing nodes and processing platforms

Af- ter gathering the data, each CH sends the extracted rel- evant data to the sink node, this last one dispatch the relevant data to the second-tier (processing agent) for

This paper overviews the ef- fectiveness of instant messaging applications as a user data environment, the primary method for extracting said user data by using bots, proposes

We demonstrate our approach by describing how our domain model, which is a domain ontology of CCO, is mapped to logical models created in Ecore and NIEM (National Information

To evaluate the e↵ectiveness of considering user requirements for optimizing query processing, we will compute the di↵erence among required quality and achieved quality of the