• Aucun résultat trouvé

BCARET Model Checking for Malware Detection.

N/A
N/A
Protected

Academic year: 2021

Partager "BCARET Model Checking for Malware Detection."

Copied!
19
0
0

Texte intégral

(1)

HAL Id: hal-02389564

https://hal.archives-ouvertes.fr/hal-02389564

Submitted on 1 Dec 2020

HAL

is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire

HAL, est

destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

BCARET Model Checking for Malware Detection.

Huu Vu Nguyen, Tayssir Touili

To cite this version:

Huu Vu Nguyen, Tayssir Touili. BCARET Model Checking for Malware Detection.. Theoretical

Aspects of Computing - ICTAC 2019 - 16th International Colloquium, Hammamet, Tunisia, October

31 - November 4, 2019, Proceedings., Oct 2019, Hammamet, Tunisia. �hal-02389564�

(2)

Detection

Huu-Vu Nguyen1, Tayssir Touili2

1LIPN and University Paris 13, France

2 CNRS, LIPN and University Paris 13, France

Abstract. The number of malware is growing fast recently. Traditional malware detectors based on signature matching and code emulation are easy to bypass. To overcome this problem, model-checking appears as an efficient approach that has been extensively applied for malware detec- tion in recent years. Pushdown systems were proposed as a natural model for programs, as they allow to take into account the program’s stack into the model. CARET and BCARET were proposed as formalisms for mali- cious behavior specification since they can specify properties that require matchings of calls and returns which is crucial for malware detection. In this paper, we propose to use BCARET for malicious behavior specifi- cation. Since BCARET formulas for malicious behaviors are huge, we propose to extend BCARET with variables, quantifiers and predicates over the stack. Our new logic is called SBPCARET. We reduce the mal- ware detection problem to the model checking problem of PDSs against SBPCARET formulas, and we propose an efficient algorithm to model check SBPCARET formulas for PDSs.

1 Introduction

The number of malware is growing fast in recent years. Traditional approaches including signature matching and code emulation are not efficient enough to detect malwares. While attackers can use obfuscation techniques to hide their malware from the signature based malware detectors easily, the code emulation approaches can only track programs in certain execution paths due to the limited execution time. To overcome these limitations, model-checking appears as an efficient approach for malware detection, since model-checking allows to check the behaviors of a program in all its execution traces without executing it.

A lof of works have been investigated to apply model-checking for malware detection [2,4,3,12,11,10,7]. [4] proposed to use finite state graphs to model the program and use the temporal logic CTPL to specify malicious behaviours. How- ever, finite graphs are not exact enough to model programs, as they don’t allow to take into account the program’s stack into the model. Indeed, the program’s stack is usually used by malware writers for code obfuscation as explained in [5]. In addition, in binary codes and assembly programs, parameters are passed to functions by pushing them on the stack before the call is made. The values of these parameters are used to determine whether the program is malicious or

(3)

not [6]. Therefore, being able to record the program’s stack is critical for mal- ware detection. To this aim, [12,11,10,13] proposed to use pushdown systems to model programs, and defined extensions of LTL and CTL (called SLTPL and SCTPL) to precisely and compactly describe malicious behaviors. However, these logics cannot specify properties that require matchings of calls and re- turns, which is crucial to describe malicious behaviours [8]. Let us consider the typical behaviour of a spyware to illustrate this. The typical behaviour of a spyware is seeking personal information (emails, bank account information,...) on local drives by searching files that match specific conditions. To do that, it has to search directories of the host to look for interesting files whose names match a certain condition. If a file is found, the spyware will invoke a payload to steal the information, then continue looking for the remaining matching files.

If a folder is found, it will pass into the folder path and continue investigating the folder recursively. To obtain this behavior, the spyware first calls the API F indF irstF ileAto search for the first matching file in a given folder path. After that, it has to check whether the call to the API function F indF irstF ileA is successful or not. When the function call fails, the spyware will call the API GetLastError. Otherwise, when the function call succeeds, a search handle h will be returned by F indF irstF ileA. There are two possibilities in this case.

If the returned result is a folder, it will call the API functionF indF irstF ileA again to search for matching results in the found folder. If the returned result is a file, it will call the function F indN extF ileAusing has first parameter to look for the remaining matching files. This behavior cannot be described by LTL or CTL since it requires to express that the return value of the API function F indF irstF ileAshould be used as input to the functionF indN extF ileA.

CARET was introduced to express linear-temporal properties that involve matchings of calls and returns [1] and CARET model-checking for PDSs was considered [7,6]. However, the above behaviour cannot be described by CARET since it is a branching-time property. To specify that behaviour naturally and intuitively, BCARET was introduced to express these branching-time proper- ties that involve matchings of calls and returns [8]. Using BCARET, the above behavior can be expressed by the following formula:

ϕsb= _

d∈D

EFg call(F indF irstF ileA)∧EXa(eax=d)∧AFa

call(GetLastError)∨call(F indF irstF ileA)

call(F indN extF ileA)∧dΓ !

where the W is taken over all possible memory addresses d that contain the values of search handleshin the program, EXa is a BCARET operator saying that “next in some run, in the same procedural context”; EFg is the standard CTLEF operator (eventually in some run), whileAFa is a BCARET operator stating that “eventually in all runs, in the same procedural context”.

In binary codes and assembly programs, the return value of an API function is placed in the register eax. Therefore, the return value of F indF irstF ileA is the value of the register eax at the corresponding return-point of the call.

(4)

Then, the subformula (call(FindFirstFileA)∧EXa(eax = d)) expresses that there is a call to the API function F indF irstF ileA whose return value is d (the abstract successor of a call is its corresponding return-point). A call to FindNextFileA requires a search handle h as parameter and h must be put on top of the program’s stack (as parameters are passed through the stack in assembly programs). To express that d is on top of the program stack, we use the regular expression dΓ. Thus, the subformula [call(FindNextFileA)∧dΓ] states that the APIFindNextFileAis invoked withdas parameter (dstores the information of the search handle h). Therefore, ϕsb states that there is a call to the function F indF irstF ileA whose return value is d (the search handle), then, in all runs starting from that call, there will be either a call to the API GetLastError or a call to the API function F indF irstF ileAor a call to the functionF indN extF ileAin whichdis used as a parameter.

However, it can be seen that this formula is huge, since it considers the disjunction (of different BCARET formulas) over all possible memory addressesd which contain the information of search handleshin the program. To represent it in a more compact fashion, we follow the idea of [4,12,10,6] and extend BCARET with variables, quantifiers, and predicates over the stack. We call our new logic SBPCARET. The above formula can be concisely described by a SBPCARET formula as follows:

ϕ0sb=∃xEFg call(F indF irstF ileA)∧EXa(eax=x)∧AFa

call(GetLastError)∨call(F indF irstF ileA)

call(F indN extF ileA)∧xΓ!

Thus, we propose in this work to use pushdown systems (PDSs) to model programs, and SBPCARET formulas to specify malicious behaviors. We reduce the malware detection problem to the model checking problem of PDSs against SBPCARET formulas, and we propose an efficient algorithm to check whether a PDS satisfies a SBPCARET formula. Our algorithm is based on a reduction to the emptiness problem of Symbolic Alternating B¨uchi Pushdown Systems. This latter problem is already solved in [10].

The rest of paper is organized as follows. In Section2, we recall the defini- tions of Pushdown Systems. Section3introduces our logic SBPCARET. Model checking SBPCARET for PDSs is presented in Section4. Finally, we conclude in Section5.

2 Pushdown Systems: A model for sequential programs

Pushdown systems is a natural model that was extensively used to model sequen- tial programs. Translations from sequential programs to PDSs can be found e.g.

in [9]. As will be discussed in the next section, to precisely describe malicious behaviors as well as context-related properties, we need to keep track of the call

(5)

and return actions in each path. Thus, as done in [8], we adapt the PDS model in order to record whether a rule of a PDS corresponds to a call, areturn, or another instruction. We call this model a Labelled Pushdown System. We also extend the notion ofrunin order to take into account matching returns of calls.

Definition 1. A Labelled Pushdown System (PDS) P is a tuple (P, Γ, ∆, ]), where P is a finite set of control locations, Γ is a finite set of stack alphabet, ] /∈ Γ is a bottom stack symbol and ∆ is a finite subset of ((P ×Γ)×(P × Γ)× {call, ret, int}). If((p, γ),(q, ω), t)∈∆ (t∈ {call, ret, int}), we also write hp, γi −→ hq, ωi ∈t ∆. Rules of ∆ are of the following form, where p ∈ P, q ∈ P, γ, γ1, γ2∈Γ, andω∈Γ:

– (r1):hp, γi−−→ hq, γcall 1γ2i – (r2):hp, γi−−→ hq, iret – (r3):hp, γi−−→ hq, ωiint

Intuitively, a rule of the formhp, γi−−→ hq, γcall 1γ2icorresponds to a call state- ment. Such a rule usually models a statement of the formγ−−−−−−→call proc γ2. In this rule, γ is the control point of the program where the function call is made,γ1

is the entry point of the called procedure, andγ2 is the return point of the call.

A ruler2 models a return, whereas a ruler3corresponds to asimple statement (neither a call nor a return). A configuration ofP is a pairhp, ωi, wherepis a control location andω∈Γis the stack content. For technical reasons, we sup- pose w.l.o.g. that the bottom stack symbol]is never popped from the stack, i.e., there is no rule in the form hp, ]i−→ hq, ωi ∈t ∆ (t∈ {call, ret, int}).P defines a transition relation =⇒P (t∈ {call, ret, int}) as follows: If hp, γi−→ hq, ωi, thent for everyω0∈Γ,hp, γω0i=⇒P hq, ωω0i. In other words,hq, ωω0iis an immediate successor ofhp, γω0i. Let=⇒P be the reflexive and transitive closure of =⇒P.

A run ofPfromhp0, ω0iis a sequencehp0, ω0ihp1, ω1ihp2, ω2i...wherehpi, ωii ∈ P×Γs.t. for everyi≥0,hpi, ωii=⇒P hpi+1, ωi+1i. Given a configurationhp, ωi, letT races(hp, ωi) be the set of all possible runs starting fromhp, ωi.

2.1 Global and abstract successors

Letπ=hp0, ω0ihp1, ω1i... be a run starting fromhp0, ω0i. Overπ, two kinds of successors are defined for every positionhpi, ωii:

– global-successor: The global-successor ofhpi, ωiiishpi+1, ωi+1iwherehpi+1, ωi+1i is an immediate successor ofhpi, ωii.

– abstract-successor: The abstract-successor ofhpi, ωiiis determined as follows:

• Ifhpi, ωii=⇒P hpi+1, ωi+1icorresponds to a call statement, there are two cases: (1) if hpi, ωii has hpk, ωki as a corresponding return-point in π, then, the abstract successor ofhpi, ωiiishpk, ωki; (2) ifhpi, ωiidoes not have any corresponding return-point in π, then, the abstract successor ofhpi, ωiiis⊥.

• If hpi, ωii =⇒P hpi+1, ωi+1i corresponds to a simple statement, the ab- stract successor ofhpi, ωiiishpi+1, ωi+1i.

• If hpi, ωii =⇒P hpi+1, ωi+1i corresponds to a return statement, the ab- stract successor ofhpi, ωiiis defined as⊥.

(6)

hp0, ω0i hp1, ω1i hp2, ω2i

hp3, ω3i hp4, ω4i hp5, ω5i

hp6, ω6i hp7, ω7i

hp8, ω8i hp9, ω9i

hp10, ω10i

hpk, ωki int

call

call ret

global-successor abstract-successor

Fig. 1: Two kinds of successors on a run

For example, in Figure1:

– The global-successors ofhp1, ω1iandhp2, ω2iarehp2, ω2iandhp3, ω3irespec- tively.

– The abstract-successors ofhp2, ω2iandhp5, ω5iarehpk, ωkiandhp9, ω9ire- spectively.

Lethp, ωi be a configuration of a PDSP. A configurationhp0, ω0iis defined as a global-successor of hp, ωi iffhp0, ω0iis a global-successor of hp, ωi over a run π∈T races(hp, ωi). Similarly, a configurationhp0, ω0iis defined as an abstract- successor of hp, ωi iff hp0, ω0i is an abstract-successor of hp, ωi over a run π ∈ T races(hp, ωi)

A global-path of P from hp0, ω0i is a sequence hp0, ω0ihp1, ω1ihp2, ω2i... where hpi, ωii ∈P×Γs.t. for everyi≥0,hpi+1, ωi+1iis a global-successor ofhpi, ωii.

Similarly, anabstract-pathofPfromhp0, ω0iis a sequencehp0, ω0ihp1, ω1ihp2, ω2i...

wherehpi, ωii ∈P×Γs.t. for everyi≥0,hpi+1, ωi+1iis an abstract-successor of hpi, ωii. For instance, in Figure1,hp0, ω0ihp1, ω1ihp2, ω2ihp3, ω3ihp4, ω4ihp5, ω5i...

is a global-path, whilehp0, ω0ihp1, ω1ihp2, ω2ihpk, ωki...is an abstract-path.

3 Malicious Behaviour Specification

In this section, we define the Stack Branching temporal Predicate logic of CAlls and RETurns (SBPCARET) as an extension of BCARET [8] with variables and regular predicates over the stack contents. The predicates contain variables that can be quantified existentially or universally. Regular predicates are expressed by regular variable expressions and are used to describe the stack content of PDSs.

3.1 Environments, Predicates and Regular Variable Expressions Let X = {x1, ..., xn} be a finite set of variables over a finite domain D. Let B :X ∪ D → D be an environment that associates each variable x∈ X with a value d∈ Ds.t B(d) =d for everyd∈ D. LetB[x←d] be an environment obtained fromB such that B[x←d](x) =dandB[x←d](y) =B(y) for every y 6=x. LetAbsx(B) ={B0 ∈ B | ∀y ∈ X, y6=x, B(y) =B0(y)} be the function that abstracts away the value ofx. LetBbe the set of all environments.

LetAP ={a, b, c, ...} be a finite set of atomic propositions. Let APD be a finite set of atomic predicates of the form b(α1, ..., αm) such thatb ∈ AP and

(7)

αi ∈ D for every 1 ≤ i ≤ m. Let APX be a finite set of atomic predicates b(α1, ..., αn) such thatb∈AP and αi∈ X ∪ D for every 1≤i≤n.

LetP = (P, Γ, ∆) be a Labelled PDS. A Regular Variable Expression (RVE) e over X ∪Γ is defined bye ::=| a∈ X ∪Γ | e+e| e.e| e. The language L(e) of a RVEeis a subset ofP×Γ× Band is defined as follows:

– L() ={(hp, i, B)|p∈P, B∈ B}

– forx∈ X,L(x) ={(hp, γi, B)|p∈P, γ∈Γ, B∈ Bs.tB(x) =γ}

– forγ∈Γ,L(γ) ={(hp, γi, B)|p∈P, B∈ B}

– L(e1.e2) ={(hp, ω0ω00i, B)|(hp, ω0i, B)∈L(e1); (hp, ω00i, B)∈L(e2)}

– L(e) ={(hp, ωi, B)|ω∈ {v∈Γ|(hp, vi, B)∈L(e)}}

3.2 The Stack Branching temporal Predicate logic of CAlls and RETurns - SBPCARET

A SBPCARET formula is a BCARET formula where predicates and RVEs are used as atomic propositions and where quantifiers are applied to variables. For technical reasons, we assume w.l.o.g. that formulas are written in positive normal form, where negations are applied only to atomic predicates, and we use the release operator R as the dual of the until operator U. From now on, we fix a finite set of variablesX, a finite set of atomic propositionsAP, a finite domain D, and a finite set of RVEs V. A SBPCARET formula is defined as follows, wherev∈ {g, a},x∈ X,e∈ V,b(α1, ..., αn)∈APX:

ϕ:=true|f alse|b(α1, ..., αn)| ¬b(α1, ..., αn)|e| ¬e|ϕ∨ϕ|ϕ∧ϕ| ∀xϕ|

∃xϕ|EXvϕ|AXvϕ|E[ϕUvϕ]|A[ϕUvϕ]|E[ϕRvϕ]|A[ϕRvϕ]

Let λ : P −→ 2APD be a labelling function which associates each control location to a set of atomic predicates. Let ϕ be a SBPCARET formula over AP. Lethp, ωibe a configuration ofP. Then we say thatP satisfiesϕathp, ωi (denoted byhp, ωi |=λϕ) iff there exists an environmentB∈ Bsuch thathp, ωi satisfies ϕ under B (denoted by hp, ωi |=Bλ ϕ). The satisfiability relation of a SBPCARET formula ϕ at a configuration hp0, ω0i under the environment B w.r.t. the labelling functionλ, denoted by hp0, ω0iBλ ϕ, is defined inductively as follows:

– hp0, ω0iBλ truefor everyhp0, ω0i – hp0, ω0i2Bλ f alsefor every hp0, ω0i

– hp0, ω0iBλ b(α1, ..., αn), iffb(B(α1), ..., B(αn))∈λ(p0) – hp0, ω0iBλ ¬b(α1, ..., αn), iffb(B(α1), ..., B(αn))∈/ λ(p0) – hp0, ω0iBλ eiff (hp0, ω0i, B)∈L(e)

– hp0, ω0iBλ ¬eiff (hp0, ω0i, B)∈/ L(e)

– hp0, ω0iBλ ϕ1∨ϕ2iff (hp0, ω0iBλ ϕ1 orhp0, ω0iBλ ϕ2) – hp0, ω0iBλ ϕ1∧ϕ2iff (hp0, ω0iBλ ϕ1 andhp0, ω0iBλ ϕ2) – hp0, ω0iBλ ∀xϕiff for every d∈ D,hp0, ω0iB[x←d]λ ϕ – hp0, ω0iBλ ∃xϕiff there exists d∈ D,hp0, ω0iB[x←d]λ ϕ

– hp0, ω0iBλ EXgϕiff there exists a global-successor hp0, ω0iofhp0, ω0isuch thathp0, ω0iBλ ϕ

– hp0, ω0i Bλ AXgϕ iff hp0, ω0i Bλ ϕ for every global-successor hp0, ω0i of hp0, ω0i

(8)

– hp0, ω0iBλ E[ϕ1Ugϕ2] iff there exists a global-pathπ=hp0, ω0ihp1, ω1ihp2, ω2i...

ofP starting fromhp0, ω0is.t.∃i≥0,hpi, ωiiBλ ϕ2and for every 0≤j < i, hpj, ωjiBλ ϕ1

– hp0, ω0iBλ A[ϕ1Ugϕ2] iff for every global-pathπ=hp0, ω0ihp1, ω1ihp2, ω2i...of P starting from hp0, ω0i, ∃i ≥ 0, hpi, ωii Bλ ϕ2 and for every 0 ≤ j < i, hpj, ωjiBλ ϕ1

– hp0, ω0iBλ E[ϕ1Rgϕ2] iff there exists a global-pathπ=hp0, ω0ihp1, ω1ihp2, ω2i...

ofP starting from hp0, ω0i s.t. for everyi≥0, ifhpi, ωii2Bλ ϕ2 then there exists 0≤j < is.t.hpj, ωjiBλ ϕ1

– hp0, ω0iBλ A[ϕ1Rgϕ2] iff for every global-pathπ=hp0, ω0ihp1, ω1ihp2, ω2i...

ofP starting fromhp0, ω0i, for everyi≥0, ifhpi, ωii2Bλ ϕ2then there exists 0≤j < is.t.hpj, ωjiBλ ϕ1

– hp0, ω0i Bλ EXaϕ iff there exists an abstract-successor hp0, ω0i of hp0, ω0i such thathp0, ω0iBλ ϕ

– hp0, ω0i Bλ AXaϕ iff hp0, ω0i Bλ ϕ for every abstract-successor hp0, ω0i of hp0, ω0i

– hp0, ω0iBλ E[ϕ1Uaϕ2] iff there exists an abstract-pathπ=hp0, ω0ihp1, ω1ihp2, ω2i...

ofP starting fromhp0, ω0is.t.∃i≥0,hpi, ωiiBλ ϕ2and for every 0≤j < i, hpj, ωjiBλ ϕ1

– hp0, ω0iBλ A[ϕ1Uaϕ2] iff for every abstract-pathπ=hp0, ω0ihp1, ω1ihp2, ω2i...

ofP,∃i≥0,hpi, ωiiBλ ϕ2 and for every 0≤j < i, hpj, ωjiBλ ϕ1

– hp0, ω0iBλ E[ϕ1Raϕ2] iff there exists an abstract-pathπ=hp0, ω0ihp1, ω1ihp2, ω2i...

ofP starting from hp0, ω0i s.t. for everyi≥0, ifhpi, ωii2Bλ ϕ2 then there exists 0≤j < is.t.hpj, ωjiBλ ϕ1

– hp0, ω0iBλ A[ϕ1Raϕ2] iff for every abstract-pathπ=hp0, ω0ihp1, ω1ihp2, ω2i...

ofP starting fromhp0, ω0i, for everyi≥0, ifhpi, ωii2Bλ ϕ2then there exists 0≤j < is.t.hpj, ωjiBλ ϕ1

Other SBPCARET operators can be expressed by the above operators:EFgϕ= E[true Ugϕ],EFaϕ=E[true Uaϕ],AFgϕ=A[true Ugϕ],AFaϕ=A[trueUaϕ],...

Closure.Given a SBPCARET formulaϕ, the closureCl(ϕ) is the set of all sub- formulae ofϕ, includingϕ. LetAP+(ϕ) ={b(α1, ..., αn)∈APX |b(α1, ..., αn)∈ Cl(ϕ)};AP(ϕ) ={b(α1, ..., αn)∈APX | ¬b(α1, ..., αn)∈Cl(ϕ)}, Reg+(ϕ) = {e∈ V |e∈Cl(ϕ)},Reg(ϕ) ={e∈ V | ¬e∈Cl(ϕ)}

4 SBPCARET Model-Checking for Pushdown Systems

In this section, we show how to do SBPCARET model-checking for PDSs. Let thenPbe a PDS,ϕbe a SBPCARET formula, andVbe the set of RVEs occuring in ϕ. We follow the idea of [10] and use Variable Automata to represent RVEs.

4.1 Variable Automata

Given a PDS P = (P, Γ, ∆) s.t. Γ ⊆ D, a Variable Automaton (VA) [10] is a tuple (Q, Γ, δ, s, F), where Q is a finite set of states, Γ is the input alphabet, s∈Qis an initial state;F ⊆Qis a finite set of accepting states; andδis a finite set of transition rules of the formp−→ {qα 1, ..., qn} where αcan bex,¬x, orγ, for anyx∈ X andγ∈Γ.

(9)

LetB∈ B. A run of VA on a word γ1, ..., γmunder B is a tree of height m whose root is labelled by the initial state s, and each node at depthk labelled by a stateqhashchildren labelled byp1, ..., phrespectively, such that:

– eitherq−→ {pγk 1, ..., ph} ∈δandγk ∈Γ; – orq−→ {px 1, ..., ph} ∈δ,x∈ X andB(x) =γk; – orq−−→ {p¬x 1, ..., ph} ∈δ,x∈ X andB(x)6=γk.

A branch of the tree is accepting iff the leaf of the branch is an accepting state. A run is accepting iff all its branches are accepting. A word ω ∈ Γ is accepted by a VA under an environmentB∈ Biff the VA has an accepting run on the wordωunder the environment B.

The language of a VA M, denoted by L(M), is a subset of (P×Γ)× B.

(hp, ωi, B)∈L(M) iffM accepts the wordω under the environmentB.

Theorem 1. [10] For every regular expressione∈ V, we can compute in poly- nomial time a Variable AutomatonM s.t.L(M) =L(e).

Theorem 2. [10] VAs are closed under boolean operations.

4.2 Symbolic Alternating B¨uchi Pushdown Systems (SABPDSs).

Definition 2. A Symbolic Alternating B¨uchi Pushdown System (SABPDS) is a tuple BP= (P, Γ, ∆, F), whereP is a set of control locations, Γ ⊆ D is stack alphabet,F ⊆P×2B is a set of accepting control locations and ∆is a finite set of transitions of the form hp, γi,−→ {hpR 1, ω1i, ...,hpn, ωni} where p∈P,γ ∈Γ, for every 1 ≤i ≤ n: pi ∈ P, ωi ∈ Γ; and R: (B)n → 2B is a function that maps a tuple of environments (B1, ..., Bn)to a set of environments.

A configuration of a SABPDS BP is a tuple hJp, BK, ωi, where p ∈ P is the current control location,B ∈ B is an environment and ω ∈ Γ is the cur- rent stack content. Let hp, γi ,−→ {hpR 1, ω1i, ...,hpn, ωni} be a rule of ∆, then, for every ω ∈ Γ, B, B1, ..., Bn ∈ B, if B ∈ R(B1, ..., Bn), then the config- uration hJp, BK, γωi(resp. {hJp1, B1K, ω1ωi, ...,hJpn, BnK, ωnωi}) is an immedi- ate predecessor (resp. successor) of {hJp1, B1K, ω1ωi, ...,hJpn, BnK, ωnωi} (resp.

hJp, BK, γωi).

A runρof a SABPDSBPstarting form an initial configurationhJp0, B0K, ω0i is a tree whose root is labelled byhJp0, B0K, ω0i, and whose other nodes are la- belled by elements in P× B ×Γ. If a node ofρis labelled by a configuration hJp, BK, ωiand hasnchildren labelled byhJp1, B1K, ω1i, ...,hJpn, BnK, ωnirespec- tively, then,hJp, BK, ωimust be a predecessor of{hJp1, B1K, ω1i, ...,hJpn, BnK, ωni}

in BP. A path of a run ρis an infinite sequence of configurations c0c1c2... s.t.

c0 is the root of ρand ci+1 is one of the children ofci for everyi≥0. A path is accepting iff it visits infinitely often configurations with control locations in F. A run ρis accepting iff every path of ρ is accepting. The language of BP, L(BP), is the set of configurationscs.t.BPhas an accepting run starting from c.

BP defines the reachability relation =⇒BP: 2(P×B)×Γ → 2(P×B)×Γ as fol- lows: (1)c=⇒BP{c}for everyc∈P× B ×Γ, (2)c=⇒BPCifCis an immediate

(10)

successor ofc; (3) ifc =⇒BP {c1, c2, ..., cn} andci =⇒BP Ci for every 1≤i≤n, thenc=⇒BPSn

i=1Ci. Givenc0=⇒BP C0, then,BP has an accepting run fromc0

iffBP has an accepting run fromc0 for everyc0∈C0.

Theorem 3. [10] The membership problem of SABPDS can be solved effectively.

Functions ofR. In what follows, we define several functions ofRwhich will be used in the next sections. These functions were first defined in [10].

1. id(B) ={B}. This is the identity function.

2.

equal(B1, ..., Bn) =

({B1} ifBi=Bj for every 1≤i, j≤n;

∅ otherwise

This function checks whether all the environments are equal and returns {B1} (which is also equal toBi for every i). Otherwise, it returns the emp- tyset.

3.

meetx{c1,...,cn}(B1, ..., Bn) =





Absx(B1) ifBi(x) =ci for 1≤i≤n,

andBi(y) =Bj(y) fory6=x,1≤i, j≤n;

∅ otherwise

This function checks whether (1)Bi(x) =cifor every 1≤i≤n(2) for every y 6=x; every 1≤i, j ≤n Bi(y) =Bj(y). If the conditions are satisfied, it returnsAbsx(B1)1, otherwise it returns the emptyset.

4.

joinxc(B1, ..., Bn) =





B1 ifBi(x) =c for 1≤i≤n andBi=Bj for 1≤i, j≤n;

∅ otherwise

This function checks whether Bi(x) = c for every i. If this condition is satisfied,equal(B1, ..., Bn) is returned, otherwise, the emptyset is returned.

5.

join¬xc (B1, ..., Bn) =









B1 ifBi(x)6=c for 1≤i≤n andBi=Bj for 1≤i, j≤n;

∅ otherwise

This function checks whether Bi(x) 6= c for every i. If this condition is satisfied,equal(B1, ..., Bn) is returned, otherwise, the emptyset is returned.

1 Absx(B1) is as defined in Section3.1

(11)

4.3 From SBPCARET model checking of PDSs to the membership problem in SABPDSs

Let P = (P, Γ, ∆) be a PDS. We suppose w.l.o.g. thatP has a bottom stack symbol]that is never popped from the stack. Let AP be a set of atomic propo- sitions. Let ϕ be a SBPCARET formula over AP, λ : P −→ 2APD be a la- belling function. Given a configuration hp0, ω0i, we propose in this section an algorithm to check whether hp0, ω0i λ ϕ, i.e., whether there exists an envi- ronment B s.t. hp0, ω0i Bλ ϕ. Intuitively, we compute an SABPDS BPϕ s.t.

hp, ωi Bλ ϕ iff hJLp, ϕM, BK, ωi ∈ L(BPϕ) for every p ∈ P, ω ∈ Γ, B ∈ B.

Then, to check if hp0, ω0iλϕ, we will check whether there exists aB ∈ B s.t.

hJLp0, ϕM, BK, ω0i ∈ L(BPϕ).

LetReg+(ϕ) ={e1, ..., ek} and Reg(ϕ) ={ek+1, ..., em}. Using Theorems 1 and2; for every 1≤i≤k, we can compute a VA Mei = (Qei, Γ, δei, sei, Fei) s.t. L(Mei) =L(ei). In addition, for every k+ 1 ≤j ≤ m, we can compute a VAM¬ej = (Q¬ej, Γ, δ¬ej, s¬ej, F¬ej) s.t.L(M¬ej) = (P×Γ)× B \L(ej). Let Mbe the union of all these automata,S andF be respectively the union of all states and final states of these automata.

LetBPϕ= (P0, Γ0, ∆0, F) be the SABPDS defined as follows:

– P0 =P∪(P×Cl(ϕ))∪ S ∪ {p} – Γ0=Γ ∪(Γ×Cl(ϕ))∪ {γ} – F=F1∪F2∪F3∪F4where

• F1 = {JLp, b(α1, ..., αn)M, βK | b(α1, ..., αn) ∈ AP+(ϕ), andβ = {B ∈ B |b(B(α1), ..., B(αn))∈λ(p)}

• F2 = {JLp,¬b(α1, ..., αn)M, βK | b(α1, ..., αn) ∈ AP(ϕ), and β ={B ∈ B |b(B(α1), ..., B(αn))∈/λ(p)}

• F3 =P ×ClR(ϕ)× B where ClR(ϕ) is the set of formulas of Cl(ϕ) in the formE[ϕ1Rvϕ2] orA[ϕ1Rvϕ2] (v∈ {g, a})

• F4=F × B

The transition relation ∆0 is the smallest set of transition rules defined as follows: For everyp∈P,φ∈Cl(ϕ),γ∈Γ andt∈ {call, ret, int}:

(~~~1) Ifφ=b(α1, ..., αn), then,hLp, φM, γi,−→ hid Lp, φM, γi ∈∆0 (~~~2) Ifφ=¬b(α1, ..., αn), then,hLp, φM, γi,−→ hid Lp, φM, γi ∈∆0

(~~~3) Ifφ=φ1∧φ2, then,hLp, φM, γi,−−−→equal [hLp, φ1M, γi,hLp, φ2M, γi]∈∆0

(~~~4) If φ = φ1 ∨φ2, then, hLp, φM, γi ,−→ hid Lp, φ1M, γi ∈ ∆0 and hLp, φM, γi ,−→id hLp, φ2M, γi ∈∆0

(~~~5) Ifφ=∃xφ1, then,hLp, φM, γi meet

x

,−−−−−→ h{c} Lp, φ1M, γi ∈∆0 for everyc∈ D (~~~6) Ifφ=∀xφ1, then,hLp, φM, γi meet

x

,−−−−→D [hLp, φ1M, γi, ...,hLp, φ1M, γi]∈∆0 where hLp, φ1M, γiis repeatedmtimes in the right-hand side, wheremis the number of elements inD

(~~~7) Ifφ=EXgφ1, then

hLp, φM, γi,−→ hid Lq, φ1M, ωi ∈∆0 for every hp, γi

t

→ hq, ωi ∈∆

(12)

(~~~8) Ifφ=AXgφ1, then,

hLp, φM, γi ,−−−→equal [hLq1, φ1M, ω1i, ...,hLqn, φ1M, ωni] ∈ ∆0, where for every 1 ≤ i≤n,hp, γi−→ hqt i, ωii ∈∆and these transitions are all the transitions of∆ that are in the formhp, γi−→ hq, ωit that havehp, γion the left hand side.

(~~~9) Ifφ=EXaφ1, then,

(a) hLp, φM, γi,−id→ hq, γ000, φ1Mi ∈∆0 for everyhp, γi−−→ hq, γcall 0γ00i ∈∆ (b) hLp, φM, γi,−id→ hLq, φ1M, ωi ∈∆0 for everyhp, γi−−→ hq, ωi ∈int ∆ (c) hLp, φM, γi,−id→ hp, γi ∈∆0 for every hp, γi−−→ hqret 0, i ∈∆ (~~~10) Ifφ=AXaφ1, then,

hLp, φM, γi,−−−→equal [hp1, γ10001, φ1Mi, ...,hpm, γm000m, φ1Mi,hLq1, φ1M, ω1i, ...,hLqn, φ1M, ωni, hp, γi, ...,hp, γi]∈∆0, where hp, γiis repeatedktimes in the right-

hand side s.t.:

(a) for every 1≤i≤m,hp, γi−−→ hpcall i, γi0γ00ii ∈∆and these transitions are all the transitions of ∆ that are in the form hp, γi −−→ hq, γcall 0γ00i that havehp, γion the left hand side.

(b) for every 1 ≤i ≤ n, hp, γi −−→ hqint i, ωii ∈ ∆ and these transitions are all the transitions of∆ that are in the formhp, γi−−→ hq, ωiint that have hp, γion the left hand side.

(c) for every 1≤i≤k,hp, γi−−→ hqret i0, i ∈∆ and these transitions are all the transitions of∆that are in the formhp, γi−−→ hqret 0, ithat havehp, γi on the left hand side.

(~~~11) Ifφ=E[φ1Ugφ2], then, (a) hLp, φM, γi,−id→ hLp, φ2M, γi ∈∆0

(b) hLp, φM, γi,−−−→equal [hLp, φ1M, γi,hLq, φM, ωi]∈∆0 for everyhp, γi−→ hq, ωi ∈t

(~~~12) Ifφ=E[φ1Uaφ2], then, (a) hLp, φM, γi,−id→ hLp, φ2M, γi ∈∆0

(b) hLp, φM, γi ,−−−→equal [hLp, φ1M, γi,hq, γ000, φMi] ∈ ∆0 for every hp, γi −−→call hq, γ0γ00i ∈∆

(c) hLp, φM, γi,−−−→equal [hLp, φ1M, γi,hLq, φM, ωi]∈∆0for everyhp, γi−−→ hq, ωi ∈int

(d) hLp, φM, γi,−id→ hp, γi ∈∆0 for every hp, γi−−→ hqret 0, i ∈∆ (~~~13) Ifφ=A[φ1Ugφ2], then,

(a) hLp, φM, γi,−id→ hLp, φ2M, γi ∈∆0

(b) hLp, φM, γi,−−−→equal [hLp, φ1M, γi;hLq1, φM, ω1i, ...,hLqn, φM, ωni]∈∆0where for every 1≤ i≤n,hp, γi −→ hqt i, ωii ∈ ∆ and these transitions are all the transitions of ∆that are in the formhp, γi−→ hq, ωit that have hp, γion the left hand side.

(~~~14) Ifφ=A[φ1Uaφ2], then, (a) hLp, φM, γi,−id→ hLp, φ2M, γi ∈∆0

(13)

(b) hLp, φM, γi,−−−→equal [hLp, φ1M, γi;hp1, γ01100, φMi, ...,hpm, γm0m00, φMi;hLq1, φM, ω1i, ..., hLqn, φM, ωni,hp, γi, ...,hp, γi] ∈ ∆0, where hp, γi is repeated k times in the right-hand side s.t.:

– for every 1≤i≤m,hp, γi−−→ hpcall i, γi0γi00i ∈∆ and these transitions are all the transitions of∆ that are in the formhp, γi−−→ hq, γcall 0γ00i that havehp, γion the left hand side.

– for every 1≤i≤n,hp, γi−−→ hqint i, ωii ∈∆and these transitions are all the transitions of ∆ that are in the form hp, γi−−→ hq, ωiint that have hp, γion the left hand side.

– for every 1≤i ≤k, hp, γi−−→ hqret i0, i ∈∆ and these transitions are all the transitions of ∆ that are in the form hp, γi −−→ hqret 0, ithat have hp, γion the left hand side.

(~~~15) Ifφ=E[φ1Rgφ2], then, we add to∆0 the rule:

(a) hLp, φM, γi,−−−→equal [hLp, φ2M, γi,hLp, φ1M, γi]∈∆0

(b) hLp, φM, γi,−−−→equal [hLp, φ2M, γi,hLq, φM, ωi]∈∆0 for everyhp, γi−→ hq, ωi ∈t

(~~~16) Ifφ=A[φ1Rgφ2], then, we add to∆0 the rule:

(a) hLp, φM, γi,−−−→equal [hLp, φ2M, γi,hLp, φ1M, γi]∈∆0

(b) hLp, φM, γi,−−−→equal [hLp, φ2M, γi;hLq1, φM, ω1i, ...,hLqn, φM, ωni]∈∆0where for every 1≤i ≤n, hp, γi−→ hqt i, ωii ∈∆ and these transitions are all the transitions of ∆that are in the formhp, γi−→ hq, ωit that have hp, γion the left hand side.

(~~~17) Ifφ=E[φ1Raφ2], then,

(a) hLp, φM, γi,−−−→equal [hLp, φ2M, γi,hLp, φ1M, γi]∈∆0

(b) hLp, φM, γi ,−−−→equal [hLp, φ2M, γi,hq, γ000, φMi] ∈ ∆0 for every hp, γi −−→call hq, γ0γ00i ∈∆

(c) hLp, φM, γi,−−−→equal [hLp, φ2M, γi,hLq, φM, ωi]∈∆0for everyhp, γi−−→ hq, ωi ∈int

(d) hLp, φM, γi,−id→ hp, γi ∈∆0 for every hp, γi−−→ hqret 0, i ∈∆ (~~~18) Ifφ=A[φ1Raφ2], then,

(a) hLp, φM, γi,−−−→equal [hLp, φ2M, γi,hLp, φ1M, γi]∈∆0

(b) hLp, φM, γi,−−−→equal [hLp, φ2M, γi;hp1, γ01100, φMi, ...,hpm, γm0m00, φMi;hLq1, φM, ω1i , ...,hLqn, φM, ωni,hp, γi, ...,hp, γi]∈∆0, wherehp, γiis repeated ktimes in the right-hand side s.t.:

– for every 1≤i≤m,hp, γi−−→ hpcall i, γi0γi00i ∈∆and these transitions are all the transitions of∆ that are in the formhp, γi−−→ hq, γcall 0γ00i that havehp, γion the left hand side.

– for every 1≤i≤n,hp, γi−−→ hqint i, ωii ∈∆and these transitions are all the transitions of ∆ that are in the form hp, γi−−→ hq, ωiint that have hp, γion the left hand side.

(14)

– for every 1≤i ≤k, hp, γi−−→ hqret i0, i ∈∆ and these transitions are all the transitions of ∆ that are in the form hp, γi −−→ hqret 0, ithat have hp, γion the left hand side.

(~~~19) for every hp, γi−−→ hq, i ∈ret ∆:

– hq,Lγ00, φ1Mi,−id→ hLq, φ1M, γ00i ∈∆0 for every γ00∈Γ,φ1∈Cl(ϕ) (~~~20) hp, γi,−→ hpid , γi ∈∆0

(~~~21) for every hp, γi−→ hq, ωi ∈t ∆:hp, γi,−→ hq, ωi ∈id0

(~~~22) Ifφ=e,eis a regular expression, then,hLp, φM, γi,−→ hsid e, γi ∈∆0 (~~~23) Ifφ=¬e,eis a regular expression, then,hLp, φM, γi,−→ hsid ¬e, γi ∈∆0 (~~~24) for every transitionq−→ {qα 1, .., qn}inM:hq, γi,−→R [hq1, i, ...,hqn, i]∈∆0,

where:

(a) R=equaliffα=γ (b) R=joinxγ iffα=x∈ X

(c) R=join¬xγ iffα=¬xand x∈ X (~~~25) for every q∈ F,hq, ]i,−→ hq, ]i ∈id0

Roughly speaking, the SABPDS BPϕ is a kind of product between P and the SBPCARET formula ϕ which ensures that BPϕ has an accepting run from hJLp, ϕM, BK, ωiiff the configurationhp, ωi satisfiesϕunder the environment B.

The form of the control locations ofBPϕisJLp, φM, BKwhereφ∈Cl(ϕ),B∈ B.

Let us explain the intuition behind our construction:

– Ifφ=b(α1, ..., αn), then, for everyω∈Γ,hp, ωiBλ φiffb(B(α1), ..., B(αn))∈ λ(p). Thus, for such aB,BPϕshould have an accepting run from

hJLp, b(α1, ..., αn)M, BK, ωi iffb(B(α1), ..., B(αn))∈ λ(p). This is ensured by the transition rules in(~~~1)which add a loop athJLp, b(α1, ..., αn)M, BK, ωiand the fact thatJLp, b(α1, ..., αn)M, BK∈F (because it is inF1). The functionid in(~~~1) ensures that the environments before and after are the same.

– Ifφ=¬b(α1, ..., αn), then, for everyω∈Γ,hp, ωiBλ φiffb(B(α1), ..., B(αn))∈/ λ(p). Thus, for such aB,BPϕshould have an accepting run from

hJLp,¬b(α1, ..., αn)M, BK, ωiiffb(B(α1), ..., B(αn))∈/λ(p). This is ensured by the transition rules in(~~~2) which add a loop at hJLp,¬b(α1, ..., αn)M, BK, ωi and the fact thatJLp,¬b(α1, ..., αn)M, BK∈F (because it is inF2). The func- tionidin(~~~2)ensures that the environments before and after are the same.

– If φ =φ1∧φ2, then, for every ω ∈ Γ, hp, ωi Bλ φ iff (hp, ωi Bλ φ1 and hp, ωi Bλ φ2). This is ensured by the transition rules in (~~~3) stating that BPϕhas an accepting run fromhJLp, φ1∧φ2M, BK, ωiiffBPϕhas an accepting run from bothhJLp, φ1M, BK, ωiandhJLp, φ2M, BK, ωi.(~~~4)is similar to(~~~3).

– Ifφ=∃xφ1, then, for everyω ∈Γ, hp, ωiBλ φiff there existsc ∈ D s.t.

hp, ωi B[x←c]λ φ1. This is ensured by the transition rules in (~~~5) stating that BPϕ has an accepting run from hJLp,∃xφ1M, BK, ωi iff there existsc ∈ D s.t. BPϕ has an accepting run from hJLp, φ1M, B[x ← c]K, ωi since B ∈ meetx{c}(B[x←c])

(15)

– If φ = ∀xφ1, then, for every ω ∈ Γ, hp, ωi Bλ φ iff for every c ∈ D, hp, ωiB[x←c]λ φ1. This is ensured by the transition rules in(~~~6)stating that BPϕ has an accepting run from hJLp,∀xφ1M, BK, ωiiff for everyc∈ D,BPϕ

has an accepting run fromhJLp, φ1M, B[x←c]K, ωisince if D={c1, ..., cm}, then,B∈meetxD(B[x←c1], ..., B[x←cm])

– If φ = EXgφ1, then, for every ω ∈ Γ, hp, ωi Bλ φ iff there exists an immediate successorhp0, ω0i ofhp, ωis.t. hp0, ω0iBλ φ1. This is ensured by the transition rules in (~~~7) stating that BPϕ has an accepting run from hJLp, EXgφ1M, BK, ωiiff there exists an immediate successorhp0, ω0iofhp, ωi s.t.BPϕhas an accepting run fromhJLp0, φ1M, BK, ω0i.(~~~8)is similar to(~~~7).

– If φ = E[φ1Ugφ2], then, for every ω ∈ Γ, hp, ωi Bλ φ iff hp, ωi Bλ φ2

or (hp, ωi Bλ φ1 and there exists an immediate successor hp0, ω0i of hp, ωi s.t.hp0, ω0iBλ φ). This is ensured by the transition rules in (~~~11) stating that BPϕ has an accepting run from hJLp, E[φ1Ugφ2]M, BK, ωi iff BPϕ has an accepting run from hJLp, φ2M, BK, ωi (by the rules in(~~~11)(a)) or (BPϕ

has an accepting run from bothhJLp, φ1M, BK, ωiand hJLp0, φM, BK, ω0iwhere hp0, ω0iis an immediate successor ofhp, ωi) (by the rules in(~~~11)(b)).(~~~13) is similar to(~~~11).

– If φ = E[φ1Rgφ2], then, for every ω ∈ Γ, hp, ωi Bλ φ iff (hp, ωi Bλ φ2 andhp, ωiBλ φ1) or (hp, ωiBλ φ2 and there exists an immediate successor hp0, ω0iofhp, ωis.t.hp0, ω0iBλ φ). This is ensured by the transition rules in (~~~15)stating thatBPϕ has an accepting run fromhJLp, E[φ1Rgφ2]M, BK, ωi iffBPϕhas an accepting run from bothhJLp, φ2M, BK, ωiandhJLp, φ1M, BK, ωi (by the rules in(~~~15)(a)); orBPϕhas an accepting run from bothhJLp, φ2M, BK, ωi andJLp0, φM, BK, ω0iwherehp0, ω0iis an immediate successor ofhp, ωi(by the rules in (~~~15)(b)). In addition, for Rg formulas, the stop condition is not required, i.e, for a formulaφ1Rgφ2that is applied to a specific run, we don’t require thatφ1must eventually hold. To ensure that the runs on whichφ2al- ways holds are accepted, we addJLp, φM, BKto the B¨uchi accepting condition F (via the subsetF3ofF).(~~~16)is similar to (~~~15).

call EXaφ1

ret

return-point Lγ00, φ1M encoded & passed down

hp, ωi

hp0, ω0i hpk−1, ωk−1i

hpk, ωki

Fig. 2:hp, ωi=⇒P hp0, ω0icorresponds to a call statement

– Ifφ=EXaφ1, then, for everyω∈Γ,hp, ωiBλ φiff there exists an abstract- successorhpk, ωkiofhp, ωis.t.hpk, ωkiBλ φ1 (A1) . Letπ∈T races(hp, ωi) be a run starting fromhp, ωion which hpk, ωki is the abstract-successor of hp, ωi. Over π, let hp0, ω0i be the immediate successor of hp, ωi. In what follows, we explain how we can ensure this.

Références

Documents relatifs

Keywords: Malware, behavioral detection, behavior abstraction, trace, term rewriting, model checking, first order temporal logic, finite state automaton, formal language..

In this work, since the primary goal of learning the models is to verify properties over the systems, we evaluate the learning algorithms by checking whether we can reliably

Moreover, we have devised an efficient proce- dure to address general classes of schedulers by using Pseudo Random Number Generators and Hash Functions.. In practice, our

We reduce the malware detection problem to the model checking problem of PDSs against SPCARET formulas, and we propose efficient algorithms to model check SPCARET formulas for PDSs..

Using this API call graph representation, we implement two static approaches in a tool, called MADLIRA, for Android malware detection: (1) ap- ply the Information Retrieval

successors of the states, then all new states from succ L are added in todo (states to be proceeded) and states from succ R are sent to be treated at the next super-step, enforcing

19th Annual Symposium on Theoretical Aspects of Computer Science Antibes - Juan les Pins, Helmut Alt; Afonso Ferreira, Mar 2002, Antibes - Juan les Pins,

In order to reason in a uniform way about analysis problems involving both existential and universal path quantica- tion (such as model-checking for branching-time logics), we