Bart Jacobs

(1)

The VeriFast Program Verifier

Bart Jacobs

^∗

Frank Piessens

Department of Computer Science, Katholieke Universiteit Leuven, Belgium {bart.jacobs,frank.piessens}@cs.kuleuven.be

Abstract

This note describes a separation-logic-based approach for the specification and verification of safety properties of pointer-manipulating imperative programs. We describe the approach for the C language.

The safety properties to be verified are specified as annotations in the source code, in the form of function preconditions and post- conditions expressed as separation logic assertions. To enable rich specifications, the user may include additional annotations that de- fine inductive datatypes, primitive recursive pure functions over these datatypes, and abstract predicates (i.e. named, parameterized assertions). A restricted form of existential quantification is sup- ported in assertions in the form of pattern matching.

Verification is based on forward symbolic execution, where memory is represented as a separate conjunction of points-to assertions and abstract predicate assertions, and data values are represented as first-order logic terms with a set of constraints. Ab- stract predicates must be folded and unfolded explicitly using ghost statements. Rewritings of the abstract state that require induction, or derivations of facts over data values that require induction, can be done by defining lemma functions, which are like ordinary C functions except that it is checked that they terminate. Specifically, when a lemma function performs a recursive call, either the recursive call must apply to a strict subset of memory, or one of its parameters must be an inductive value whose size decreases at each recursive call.

Assertions over data values are delegated to an SMT solver, formulated as queries against an axiomatization of the inductive datatypes and recursive pure functions. Importantly, no exhaustive- ness axioms are included in this axiomatization; this prevents the SMT solver from performing case analysis on inductive values.

Combined with a measure to prevent infinite reductions due toself- feeding recursions, this ensures termination of the SMT solver.

The time complexity of verification is unbounded in theory.

Specifically, since recursive pure functions of arbitrary time complexity may be defined, there is no bound on the time complexity of SMT queries. Furthermore, the approach, as currently implemented, does not perform joining of symbolic execution paths after conditional constructs; therefore, the number of symbolic execution steps is exponential in the number of such constructs. However, since no significant search is performed implicitly by the verifier or the SMT solver, performance is very good in practice.

A prototype implementation and example annotated programs are available athttp://www.cs.kuleuven.be/˜bartj/verifast/.

1. Introduction

We introduce the approach through an example annotated C program that implements a linked list ADT. Successive figures show successive fragments of the example program.

∗Bart Jacobs is a Postdoctoral Fellow of the Research Foundation - Flan- ders (FWO)

structnode{ structnode∗next;

intvalue;

};

predicatenode(structnode∗n, structnode∗next,intvalue)

requiresn→next7→next∗n→value7→value

∗malloc block node(n);

structnode∗create node(structnode∗next,intvalue) requires emp;

ensuresnode(result,next,value);

{

structnode∗n:=malloc(sizeof(structnode));

n→next:=next; n→value:=value;

closenode(n,next,value);

returnn;

}

Figure 1. Example demonstrating abstract predicates and ghost statements (Note: annotations are shown on a gray background.

Also, for readability, we typeset some operators differently from the implementation.)

inductivelist=nil|cons(int,list);

predicatelseg(structnode∗n1,structnode∗n2,listv) requiresn1=n2?v=nil

: node(n1,?n,?h)∗lseg(n,n2,?t)∗v=cons(h, t);

Figure 2. Example demonstrating inductive datatype definitions, recursive abstract predicates, conditional assertions, and pattern matching

Figure 1 shows functioncreateNode. It uses an abstract predicate to hide the internal layout of a node. Thecloseghost statement removes the points-to assertions for the individual fields ofn from the abstract memory representation, and adds anodeabstract predicate assertion, as expected by the postcondition.

Figure 2 shows a way to denote a piece of memory contain- ing a set of consecutive nodes. Specifically, abstract predicate lseg(n1,n2,v)represents a set of consecutive nodes where the first node is atn1and the last node’s next pointer points ton2, and

(2)

the nodes store the list of integersv. As a special case, ifn1equals n2, the predicate denotes the empty piece of memory.

During symbolic execution of a function, assertions arepro- ducedandconsumed. Producing a points-to assertion or an abstract predicate assertion means adding it to the abstract memory, and consuming it means removing a matching assertion from the abstract memory. If no matching assertion is present in the abstract memory, an error is reported.¹ If the assertion being consumed contains patterns, the matching process binds the pattern variables;

their scope includes the rest of the assertion, or if the pattern occurs in a function body or precondition, its scope includes the rest of the function. Producing a pure assertion (i.e., a boolean expres- sion), means adding it to the set of constraints, and consuming it means asking the SMT solver to check that it follows from the current set of constraints. Producing or consuming a separate conjunction means first producing, resp. consuming the first operand, and then producing, resp. consuming the second operand. Producing or consumingempdoes nothing. If during execution of a function, a conditional construct is encountered, then the remainder of the execution is performed once for each branch of the construct, after adding the corresponding constraint to the constraint set. The conditional constructs include the if and switch statements, the if- then-else assertions, and the switch assertions.

Execution of a function starts with an empty memory and an empty set of data value constraints. Then, the precondition is produced. Then, each statement is executed. And finally, the postcondition is consumed. If subsequently, any assertions are left in the abstract memory, this is considered a potential memory leak and an error is reported. Execution of a function call statement proceeds by first consuming the call’s precondition and then producing its postcondition. Execution of an open ghost statement proceeds by first consuming the abstract predicate assertion and then producing its body. Execution of a close ghost statement proceeds by first consuming the predicate’s body and then producing the abstract predicate assertion. Patterns may be used as abstract predicate arguments in an open statement, but in the current implementation they cannot be used as arguments in a close statement.

Figure 3 shows the first part of the client-visible interface of the linked list ADT. The implementation keeps a sentinel node at the end of the list, and it keeps a pointer to the first node and to this sentinel node. The dummy patterns ( ) in the definition of the llistpredicate indicate that thenextandvaluefields of the sentinel node are insignificant.

Figure 4 shows alemma function, which is like a C function except that it is declared in an annotation and the verifier checks that it terminates and that has no effect on memory (i.e. it does not allo- cate, free, or write to memory). The only effect of calling a lemma function is that it rewrites the abstract memory representation into a semantically equivalent but syntactically different one, and/or that it adds constraints to the set of data value constraints.

In this example, there is no net change to the memory representation; all the lemma function does is add a constraint. Specifically, given two nodes for which there are separate abstract predicate assertions in the memory representation, the lemma produces a constraint that says that the nodes are distinct.

Such distinctness constraints are not produced automatically by the verifier for abstract predicate assertions, since the fact that two abstract predicate assertions calling the same abstract predicate appear in memory does not imply anything about distinctness of the arguments. However, the verifier produces them for points- to assertions. Specifically, when producing a points-to assertion t→f 7→ v, then for any existing points-to assertion t⁰→f 7→

1In the current implementation, if multiple matching assertions are present, an “ambiguous match” error is reported. This may change in the future.

structllist{ structnode∗first;

structnode∗last;

};

predicatellist(structllist∗l,listv)

requiresl→first7→?fn∗l→last7→?ln∗lseg(fn,ln, v)

∗node(ln, , )∗malloc block llist(l);

structllist∗create llist() requires emp;

ensuresllist(result,nil);

{

structllist∗l:=malloc(sizeof(structllist));

structnode∗n:=create node(0,0);

l→first:=n;

l→last:=n;

closelseg(n, n,nil);

closellist(l,nil);

returnl;

}

Figure 3. Example demonstrating dummy patterns

lemma voiddistinct nodes(

structnode∗n1,structnode∗n2)

requiresnode(n1,?n1n,?n1v)∗node(n2,?n2n,?n2v);

ensuresnode(n1,n1n,n1v)∗node(n2,n2n,n2v)

∗n16=n2;

{

opennode(n1, , );

opennode(n2, , );

closenode(n1,n1n,n1v);

}

Figure 4. Example demonstrating lemma functions, distinctness constraint production and patterns in open statements

v⁰ in the memory representation, a constraint t 6= t⁰ is added automatically. In the example, this occurs during execution of the second open statement.

Figure 5 shows the second client-visible list ADT function, functionadd. It adds a value to the end of the list. Its contract describes its effect on the ADT’s abstract value using thefixpoint functionadd. (Note that fixpoint function names and non-fixpoint (i.e., regular or lemma) function names are in separate namespaces;

the former may occur only in expressions in annotations, whereas the latter may occur only in call statements.)

A fixpoint function is not allowed to read or modify memory. Its body must be a switch statement over one of the function’s parameters. We call this parameter the function’sinductive parameter.

The body of each clause of the switch statement must be a return statement. A fixpoint function may call other fixpoint functions, but not regular functions or lemma functions. Furthermore, to en- sure termination, any call must either be a call of a fixpoint function declared earlier in the program, or it must be a direct recursive call where the argument for the inductive parameter is a variable bound by the switch statement.

(3)

fixpointlist add(listv,intx){ switch(v){

casenil: returncons(x,nil);

casecons(h, t) : return cons(h,add(t, x));

} }

lemmaadd lemma(structnode∗n1,structnode∗n2, structnode∗n3)

requireslseg(n1,n2,?v)∗node(n2,n3,?x)

∗node(n3, , );

ensureslseg(n1,n3,add(v, x))∗node(n3, , );

{

distinct nodes(n2,n3);

openlseg(n1, , );

if (n1 =n2){

closelseg(n3,n3,nil);

}else{

distinct nodes(n1,n3);

opennode(n1,?n1n,?n1v);

add lemma(n1n,n2,n3);

}

closelseg(n1,n3,add(v, x));

}

voidadd(structllist∗l,intx) requiresllist(l,?v);

ensuresllist(l,add(v, x));

{

openllist(l, v);

structnode∗n:=create node(0,0);

structnode∗nl :=l→last; opennode(nl, , );

nl→next:=n;

nl→value:=x;

closenode(nl, n, x);

l→last:=n;

structnode∗nf :=l→first;

add lemma(nf,nl,n);

closellist(l,add(v, x));

}

Figure 5. Example demonstrating fixpoint functions and recursive lemma functions

Regular functionadd creates a new node to serve as the new sentinel node, then updates the old sentinel node’s fields, and finally calls the lemma functionadd lemma to merge the old sentinel node into thelsegabstract predicate assertion. The lemma function does so using recursion. Lemma functions may perform recursive calls, but only direct recursive calls, and termination is ensured by checking that at each recursive call either the size of the piece of memory that the function operates on decreases (specifically, after consuming the precondition there must be a points-to assertion left in the memory representation), or, similar to a fixpoint function, the function’s body is a switch statement over one of its parameters, and the argument for the inductive parameter in the recursive call is bound by this switch statement. Note that a recursive lemma

intremoveFirst(structllist∗l) requiresllist(l,?v)∗v6=nil;

ensuresllist(l,?t)∗v=cons(result, t);

{

openllist(l, v);

structnode∗nf :=l→first; openlseg(nf,?nl, v);

opennode(nf, , );

structnode∗nfn:=nf→next;

intnfv :=nf→value;

free(nf);

l→first:=nfn;

openlseg(nfn,nl,?t);

closelseg(nfn,nl, t);

closellist(l, t);

returnnfv; }

Figure 6. Example demonstrating execution splits due to conditional constructs in the bodies of predicates being opened

voiddispose(structllist∗l) requiresllist(l, );

ensures emp;

{

openllist(l, );

structnode∗n:=l→first;

structnode∗nl :=l→last;

while(n6=nl)

invariantlseg(n,nl, );

{

openlseg(n,nl, );

opennode(n, , );

structnode∗next :=n→next; free(n);

n:=next;

}

openlseg(n, n, );

opennode(l, , );

free(nl);

free(l);

}

Figure 7. Example demonstrating loops

constitutes an inductive proof of the fact that the precondition implies the postcondition.

Figure 6 shows a function that removes the first element from a list. It requires that the list is non-empty. When thelseg start- ing atnf is opened, an execution split occurs because the body of this predicate is an if-then-else predicate. The verifier notices im- mediately that the then branch is infeasible and does not continue execution on this branch.

Notice also that the first node is freed. A statement free(p);

looks for an assertion of the formmalloc block T(p), and then for points-to assertions of the formp→f7→v, for each fieldfof T, and it consumes all of these assertions.

(4)

voidmain() requires emp;

ensures emp;

{

structllist∗l:=create llist();

add(l,10);

add(l,20);

add(l,30);

add(l,40);

intx0 :=removeFirst(l);

assert(x0 = 10);

intx1 :=removeFirst(l);

assert(x1 = 20);

dispose(l);

}

Figure 8. Example client program for the list ADT

Figure 7 shows functiondispose, which takes a list of arbitrary length. It first frees all proper nodes, then it removes the remain- ing emptylseg assertion, then it frees the sentinel node, and finally it frees thestructllistobject itself. A loop invariant must be provided for each loop. Execution of a loop proceeds by first consuming the loop invariant, and then proceeding execution along two branches: in one branch, fresh data value symbols are assigned to all locals modified by the loop, then the loop condition is produced, then the loop invariant is produced, then the loop body is executed, and finally the loop invariant is consumed. If any assertions remain, this is considered a leak error. In the other branch, first fresh data value symbols are assigned to all locals modified by the loop, then the negation of the loop condition is produced, then the loop invariant is produced, and then execution proceeds after the loop statement.

Figure 8 wraps up the example by showing an example client program for the list ADT. This program verifies; it follows that all assert statements succeed and no memory is leaked.

Notice that the SMT solver successfully evaluates theaddfix- point function applications.

2. Ensuring SMT solver termination

The verifier generates a set of axioms from the inductive datatype definitions and the fixpoint function definitions and uses an SMT solver to solve queries against this axiomatization. The axioms consist of the disjointness, injectiveness, and size axioms for the datatype constructors, and the reduction axioms for the fixpoint functions. Specifically, there is a reduction axiom for each fixpoint functionf and for each constructorcof the type of its inductive parameter.

For example, the axioms generated for the example are:

nil tag6=cons tag list tag(nil) =nil tag

∀ht.list tag(cons(h, t)) =cons tag

∀ht.cons proj0(cons(h, t)) =h

∀ht.cons proj1(cons(h, t)) =t size(nil) = 0

∀ht.size(cons(h, t)) = 1 +size(t)

∀x.add(nil, x) =cons(x,nil)

∀htx.add(cons(h, t), x) =cons(h,add(t, x)) Note: for each quantifier, the left-hand side (and only the left- hand side) of the equation is used as the trigger.

In general, when given such an axiomatization, an SMT algorithm that consists of instantiating quantifiers arbitrarily until all quantifiers have been applied to all matching E-graph-nodes or an inconsistency is found, does not always terminate for all queries.

For example, such an algorithm does not terminate for the below input file.

inductivenat=zero|succ(nat);

fixpointnat foo(natn,natk){ switch(n){

casezero:returnzero;

casesucc(n) :returnsucc(foo(n,succ(k)));

} }

fixpoint intnat2int(natn){ switch(n){

casezero:return0;

casesucc(n) :return1 +nat2int(n);

} }

lemma voidbar(natn)

requiresn=foo(succ(n),zero);

ensuresnat2int(n) = 0;

{ }

The cause of the non-termination in this example is that each reduction of afoonode produces a new reduciblefoonode. More- over, after the first reduction, each next inductive argument is a new node, generated by an earlier reduction. Therefore, the reduction chain in this example is aself-feedingreduction chain.

We propose a modification of the SMT algorithm which prevents self-feeding reduction chains, by keeping track of reduction chains and of the origin of each constructor node. A reduction is performed only if the constructor node that serves as the inductive argument was not generated by the reduction chain to which the fixpoint function node belongs. We believe this ensures termination, since a non-terminating search path must include an infinite number of reductions. Therefore, there must be an infinite reduction chain. This chain must be fed by an infinite path of constructor nodes. If we always exhaust the size axioms before applying reduction axioms, the path cannot be a cycle, since a cycle would be detected by the size axioms. Therefore, the path must include infinitely many distinct constructor nodes. These must have been generated by an infinite reduction chain; either the same one, or one whose fixpoint function was declared earlier in the program.

By induction on the number of preceding fixpoint functions, one can obtain a contradiction.

Further detailing the algorithm and the proof is future work.

Note: We have not yet implemented this algorithm. The current implementation uses the Z3 SMT solver, which does not imple- ment this algorithm. However, queries against our theory using Z3 are guaranteed to terminate, since Z3 limits the number of instan- tiations of any given quantifier. Since our theory does not contain nested quantifiers and therefore does not generate new quantifiers dynamically, this guarantees termination. However, this also limits the power of the algorithm. For example, whereas our proposed algorithm successfully fully reduces all ground terms, Z3 does not.

(5)

3. Performance

Since neither the verifier itself nor the SMT solver need to perform any significant amount of search, verification time is predictable and low. The table below shows indicative verification times for the two example programs that ship with the current release of the verifier.

Program Nb. C stmts Nb. ghost stmts Verif. time

linkedlist.c 90 113 0.25s

composite.c 38 287 1.5s

4. Grammar

The current implementation supports only a limited subset of C.

The grammar of the current implementation is as follows:

program::=decl^∗

decl::=structdecl|inddecl|fpdecl

|preddecl|fundecl structdecl::=structs{fielddecl^∗};

fielddecl::=t f;

t::=structs∗ |int|i inddecl::=inductivei=ctordecl^∗; ctordecl::=‘|’c(t^∗)

fpdecl::=fixpointt g(paramdecl^∗) {switch(x){fpclause^∗} } fpclause::=casec(x^∗) :returne; paramdecl::=t x

e::=x|n| ¬e|ebinope|g(e^∗)|e?e : e binop::= = | 6= | < | ≤ | + | − | ∧ | ∨ preddecl::=predicatep(paramdecl^∗)requiresa;

a::=emp|e|e→f7→pat|p(pat^∗)|a∗a

|e?a : a|switch(e){aclause^∗} pat::=e|?x|

aclause::=casec(x^∗) :returna; fundecl::=lemma^?rettypeh(paramdecl^∗)

requiresa; ensuresa;{s^∗} rettype::=void|t

s::=t x:=h(e^∗); |t x:=e→f;|t x:=e; |x:=e;

|e→f:=e; |if(e){s^∗}else{s^∗}

|switch(e){sclause^∗}

|while(e)invarianta;{s^∗} sclause::=casec(x^∗) :s^∗

5. Related work

Reynolds (2002) introduced separation logic. Smallfoot (Berdine et al. 2004, 2005, 2006) is a tool that performs symbolic execution using separation logic. This technique has been extended for greater automation (Yang et al. 2008), for termination proofs (Brotheston et al. 2008; Cook et al. 2007), for fine-grained concurrency (Calcagno et al. 2007), for lock-based concurreny (Gotsman et al. 2007), and for Java (Distefano and Parkinson 2008; Haack and Hurlin 2008). Unlike VeriFast, all of these tools attempt to infer loop invariants automatically. Abstract predicates were introduced by Parkinson and Bierman (2005). The details of applying separation logic to the C language were studied by Tuch et al. (2007).

Alternative specification and verification approaches, based on generation of verification conditions instead of symbolic execution, include Zee et al. (2008) and the approaches based on dynamic frames (Kassios 2006), in particular the work on automation of dynamic frames (Smans et al. 2008b,c,a), the work on regional logic (Banerjee et al. 2008), and the work on Dafny (Leino 2008).

Acknowledgments

The authors would like to thank Jan Smans for the many helpful discussions, and for co-authoring thecomposite.cexample.

References

Anindya Banerjee, David A. Naumann, and Stan Rosenberg. Regional logic for local reasoning about global invariants. InECOOP 2008, 2008.

Josh Berdine, Cristiano Calcagno, and Peter W. O’Hearn. A decidable fragment of separation logic. In K. Lodaya and M. Mahajan, editors, FSTTCS 2004, number 3328 in LNCS, pages 97–109, 2004.

Josh Berdine, Cristiano Calcagno, and Peter W. O’Hearn. Symbolic execution with separation logic. In K. Yi, editor,APLAS 2005, number 3780 in LNCS, pages 52–68, 2005.

Josh Berdine, Cristiano Calcagno, and Peter W. O’Hearn. Smallfoot:

Modular automatic assertion checking with separation logic. InFMCO, 2006.

James Brotheston, Richard Bornat, and Cristiano Calcagno. Cyclic proofs of program termination in separation logic. InPOPL 2008, 2008.

Cristiano Calcagno, Matthew Parkinson, and Viktor Vafeiadis. Modular safety checking for fine-grained concurrency. InSAS 2007, 2007.

Byron Cook, Alexey Gotsman, Andreas Podelski, Andrey Rybalchenko, and Moshe Y. Vardi. Proving that programs eventually do something good. InPOPL 2007, 2007.

Dino Distefano and Matthew Parkinson. jStar: Towards practical verification for Java. InOOPSLA 2008, 2008.

Alexey Gotsman, Josh Berdine, Byron Cook, Noam Rinetzky, and Mooly Sagiv. Local reasoning for storable locks and threads. InAPLAS 2007, 2007.

Christian Haack and Cl´ement Hurlin. Separation logic contracts for a Java- like language with fork/join. InAMAST 2008, 2008.

Ioannis T. Kassios. Dynamic frames: support for framing, dependencies and sharing without restrictions. InFM 2006, 2006.

K. Rustan M. Leino. Specification and verification of object-oriented software: Marktoberdorf international summer school 2008 pre-lecture notes, 2008.

Matthew Parkinson and Gavin Bierman. Separation logic and abstraction.

InPOPL 2005, 2005.

J. C. Reynolds. Separation logic: a logic for shared mutable data structures.

InLICS 2002, 2002.

Jan Smans, Bart Jacobs, and Frank Piessens. An automatic verifier for Java- like programs based on dynamic frames. InFASE 2008, 2008a.

Jan Smans, Bart Jacobs, and Frank Piessens. Implicit dynamic frames. In Workshop on Formal Techniques for Java-like Programs, 2008b.

Jan Smans, Bart Jacobs, and Frank Piessens. VeriCool: an automatic verifier for a concurrent object-oriented language. InFMOODS 2008, 2008c.

Harvey Tuch, Gerwin Klein, and Michael Norrish. Types, bytes, and separation logic. InPOPL 2007, 2007.

Hongseok Yang, Oukseh Lee, Cristiano Calcagno, Dino Distefano, and Peter O’Hearn. Scalable shape analysis for systems code. InCAV 2008, 2008.

Karen Zee, Viktor Kuncak, and Martin Rinard. Full functional verification of linked data structures. InPLDI 2008, 2008.