Proof Assistants
Proving Imperative Programs
Guillaume Melquiond
(Christine Paulin & Jean-Christophe Filliâtre)
INRIA Saclay – Île-de-France Laboratoire de Recherche en Informatique
2011-02-01
Outline
1 Monads and Coq
2 Hoare Logic
Principles
Coq Implementation The Why Tool
3 High-Level Datatypes
Handling Arrays
Advanced Memory Models
Monads and Coq
Monads
Used in Haskell to simulate imperative features.
A monad is given by
M:Type→Type,M A represents computations of type A;
return:A→M A
(aka unit)
return v represents the value v seen as the result of a computation;
bind:MA→(A→M B)→M B,
bind passes the result of the first computation to the second one.
Haskell syntax: “do p <- e1; e2” means bind e
1 (λp.e
2).Axioms of Monads
Monads have to satisfy the following laws:
1
bind
(returnx
)f
=f x,
2
bind m return
=m,
3
bind m
(λx.bind
(f x)g
) =bind
(bindm f
)g .
Additional operators are defined to provide specific features to monads.
Monads and Coq
Combining State Monad and Error Monad
While it is generally impossible to merge two monads, there are two ways of combining the state and error monads:
1
M A
=S
→(Error |Value of
(S×A)),
2
M A
=S
→(S×(Error |Value of A)).
I n d u c t i v e res { A : T y p e } : T y p e :=
| E r r o r : res
| OK : A - > s t a t e - > res .
D e f i n i t i o n M ( A : T y p e ) : T y p e := s t a t e - > @ r e s A .
Error&State Monad in Coq
D e f i n i t i o n ret { A : T y p e } ( x : A ) : M A :=
fun s = > OK x s .
D e f i n i t i o n e r r o r { A : T y p e } : M A :=
fun s = > E r r o r .
D e f i n i t i o n b i n d { A B : T y p e } ( f : M A ) ( g : A- >M B ) : M B :=
fun s = >
m a t c h f s w i t h
| E r r o r = > E r r o r
| OK a s ’ = > g a s ’ end.
N o t a t i o n " ’ do ’ X <- A ; B " := ( b i n d A (fun X = > B ))
( at l e v e l 200 , X ident , A at l e v e l 100 , B at l e v e l 2 0 0 ) .
Monads and Coq
Example: Global Counter
Example (Global Counter Using a Reference)
let f r e s h = let r = ref 0 in fun () - > i n c r r ; ! r
R e c o r d s t a t e := { r : nat }.
D e f i n i t i o n f r e s h : M nat := fun s = >
let n := S ( r s ) in OK n {| r := n |}.
F i x p o i n t l i s t _ n ( n : nat ) : M ( l i st nat ) :=
m a t c h n w i t h
| O = > ret nil
| S n ’ = >
do k <- f r e s h ; do l <- l i s t _ n n ’;
ret ( k :: l ) end.
Outline
1 Monads and Coq
2 Hoare Logic
Principles
Coq Implementation The Why Tool
3 High-Level Datatypes Handling Arrays
Advanced Memory Models
Hoare Logic Principles
Hoare Logic
Definition (Small Imperative Language) e
::=n
|x
|e op e
op
::= +| − | × | /| = | 6= | < | > | ≤ | ≥ |and
|or i
::=skip
|x
:=e
|i; i
|if e then i else i
|while e do i done
Example (Program ISQRT)
c o u n t := 0; sum := 1;
w h i l e sum <= n do
c o u n t := c o u n t + 1; sum := sum + 2 * c o u n t + 1 d o n e
Specification: at the end of execution, count contains
b√nc.
Operational Semantic
State E: function assigning values to variables.
It is extended to any expression and logical formulas:
E(n)
=n
E(e
1op e
2) =E
(e1)op E
(e2)Definition (Operational Semantic)
E
−→x := e
E
[x ←E
(e)]E
1 −→i1;i2
E
3if E
1−→i1
E
2et E
2−→i2
E
3E
1 −→ife then i1 else i2
E
2if E
1(e) =true and E
1−→i1
E
2E
1 −→ife then i1 else i2
E
2if E
1(e) =false and E
1−→i2
E
2E
1 −→while e doi done
E
3if E
1(e) =true, E
1−→i
E
2and E
2 −→whilee doi done
E
3E
−→while e doi done
E if E
(e) =false
Hoare Logic Principles
Hoare Triple
Definition (Hoare Triple)
{P}
i
{Q}with P and Q first-order formulas
using the same expressions (and variables!) than programs Definition (Validity)
Triple
{P}i
{Q}is valid if, for any states E
1and E
2such that E
1(P)holds and E
1−→i
E
2, E
2(Q)holds.
Example (Specification of ISQRT)
{n≥
0} ISQRT
{count2≤n
<(count+1)
2}Rules of Hoare Logic
{P}
skip
{P}{P∧
e
=true} i
1 {Q} {P ∧e
=false} i
2 {Q}{P}
if e then i
1else i
2 {Q}{P[x ←
e
]}x
:=e
{P}{I ∧
e
=true} i
{I}{I}
while e do i done
{I ∧e
=false}
{P}
i
1 {Q} {Q}i
2 {R}{P}
i
1;i
2 {R}{P0}
i
{Q0}P
⇒P
0Q
0 ⇒Q
{P}i
{Q}Hoare Logic Principles
Soundness and Completeness
Soundness: every triple derived from the rules is valid.
Practical issue: one has to guess the intermediate annotations (e.g., for ISQRT, a loop invariant is needed).
Theoretical issue: is Hoare logic complete?
That is, can every valid triple be derived from the rules?
Completeness and Weakest Preconditions
Given i et Q,
{P
| {P}i
{Q}valid
}has some minimal elements P
0: for any P such that
{P}i
{Q}valid, P
⇒P
0.
The weakest precondition WP(i
,Q) can be computed by induction on i:
WP(x
:=e, Q)
=Q[x
←e]
WP(i
1;i
2,Q)
=WP(i
1,WP(i
2,Q
))WP(if e then i
1else i
2,Q)
= (e =true
⇒WP(i
1,Q))
∧ (e =false
⇒WP(i
2,Q)) WP(while e do i done, Q)
= _H
kwith H
0 = (e =false
∧Q)
and H
k+1 = (H0∨WP(i
,H
k))Formula for while is infinite hence illegal, but there exists an equivalent
yet finite formula as long as the underlying logic is sufficiently expressive.
Hoare Logic Coq Implementation
Coq Implementation
How to implement Hoare logic in Coq?
1
deep embedding
Formalization of Hoare logic: terms, code, rules are internalized.
⇒
useful to prove soundness or completeness but hardly usable in practice.
2
shallow embedding
Programs are interpreted by their semantic (Prop) and Hoare rules are just tactics/theorems.
⇒
usable on specific programs, but metatheory is now out of scope.
Imperative Features
Extensions to the basic language:
allow side effects in expressions,
access previous values of a variable in a logical formula, handle function/procedure calls (modularity),
express loop/program termination, allow control flow jumps and exceptions,
extend datatypes: arrays, records, pointers, objects, and so on.
Hoare Logic The Why Tool
The Why Tool
http://why.lri.fr/
Functional language with imperative features (references, exceptions).
Aliasing between references syntactically forbidden.
Effect analysis: accessed variables, raised exceptions.
Modularity in programs and specifications.
Weakest precondition computation.
Generate verification conditions for
proof assistants (Coq, Isabelle, PVS, HOL light) and automated provers (Alt-Ergo, Simplify, CVC3, Z3, Gappa).
Example: Square Root
let i s q r t ( n :int) = { n >= 0 }
let c o u n t = ref 0 in
let sum = ref 1 in
w h i l e ! sum <= n do
{ i n v a r i a n t c o u n t >= 0 and n >= c o u n t * c o u n t and sum = ( c o u n t + 1 ) * ( c o u n t +1)
v a r i a n t n - sum } c o u n t := ! c o u n t + 1;
sum := ! sum + 2 * ! c o u n t + 1 d o n e;
! c o u n t
{ r e s u l t >= 0 and
r e s u l t * r e s u l t <= n < (r e s u l t+ 1 ) * (r e s u l t+1) }
High-Level Datatypes
Outline
1 Monads and Coq
2 Hoare Logic Principles
Coq Implementation The Why Tool
3 High-Level Datatypes
Handling Arrays
Advanced Memory Models
High-Level Datatypes
Hoare’s assignment rule
{P[x ←
e]} x
:=e
{P}implicitly states that
expressions e are shared between programs and logical formulas, there is no aliasing between program variables.
Handling datatypes with Hoare logic requires some abstractions.
High-Level Datatypes Handling Arrays
Example: Arrays and the Dutch National Flag
Dijkstra’s problem: sort an array containing elements with only three possible values (blue, white, red).
Invariant for the algorithm:
0 b i r n
BLUE WHITE . . . to be done. . . RED
Sorting Colors
t y p e d e f e n u m { BLUE , WHITE , RED } c o l o r ;
v o i d s w a p (int t [] , int i , int j ) {
c o l o r c = t [ i ]; t [ i ] = t [ j ]; t [ j ] = c ; }
v o i d f l a g (int t [] , int n ) { int b = 0 , i = 0 , r = n ; w h i l e ( i < r ) {
s w i t c h ( t [ i ]) {
c a s e B L U E : s w a p ( t , b ++ , i + + ) ; b r e a k; c a s e W H I T E : i ++; b r e a k;
c a s e RED : s w a p ( t , - -r , i ); b r e a k; }
} }
High-Level Datatypes Handling Arrays
Abstracting the C Code
The C code itself will not be checked directly.
1
Colors are replaced by an abstract type.
2
Arrays are replaced by references pointing to applicative arrays.
t y p e c o l o r
l o g i c b l u e : c o l o r l o g i c w h i t e : c o l o r l o g i c red : c o l o r
p r e d i c a t e i s _ c o l o r ( c : c o l o r ) = c = b l u e or c = w h i t e or c = red
p a r a m e t e r e q _ c o l o r : c1 : c o l o r -> c2 : c o l o r - >
{} b o o l { if r e s u l t t h e n c1 = c2 e l s e c1 < > c2 }
Abstract Type for Applicative Arrays
t y p e c o l o r _ a r r a y
l o g i c acc : c o l o r _ a r r a y , int - > c o l o r
l o g i c upd : c o l o r _ a r r a y , int, c o l o r - > c o l o r _ a r r a y
a x i o m a c c _ u p d _ e q :
f o r a l l a : c o l o r _ a r r a y . f o r a l l i :int. f o r a l l c : c o l o r . acc ( upd ( a , i , c ) , i ) = c
a x i o m a c c _ u p d _ n e q :
f o r a l l a : c o l o r _ a r r a y . f o r a l l i , j :int. f o r a l l c : c o l o r . i < > j - > acc ( upd ( a , j , c ) , i ) = acc ( a , i )
High-Level Datatypes Handling Arrays
Bounded Accesses to Arrays
l o g i c l e n g t h : c o l o r _ a r r a y - > int
a x i o m l e n g t h _ u p d a t e :
f o r a l l a : c o l o r _ a r r a y . f o r a l l i :int. f o r a l l c : c o l o r . l e n g t h ( upd ( a , i , c )) = l e n g t h ( a )
p a r a m e t e r get :
t : c o l o r _ a r r a y ref - > i :int - >
{ 0 <= i < l e n g t h ( t ) } c o l o r r e a d s t { r e s u l t= acc ( t , i ) }
p a r a m e t e r set :
t : c o l o r _ a r r a y ref - > i :int - > c : c o l o r - >
{ 0 <= i < l e n g t h ( t ) } u n i t w r i t e s t { t = upd ( t@ , i , c ) }
The swap Function
let s w a p ( t : c o l o r _ a r r a y ref) ( i :int) ( j :int) = { 0 <= i < l e n g t h ( t ) and 0 <= j < l e n g t h ( t ) }
let u = get t i in
set t i ( get t j );
set t j u
{ t = upd ( upd ( t@ , i , acc ( t@ , j )) , j , acc ( t@ , i )) }
Verification conditions are automatically verified by Alt-Ergo.
High-Level Datatypes Handling Arrays
The Sorting Function
let d u t c h _ f l a g ( t : c o l o r _ a r r a y ref) ( n :int) =
let b = ref 0 in
let i = ref 0 in
let r = ref n in
w h i l e ! i < ! r do
if e q _ c o l o r ( get t ! i ) b l u e t h e n b e g i n s w a p t ! b ! i ;
b := ! b + 1;
i := ! i + 1
end e l s e if e q _ c o l o r ( get t ! i ) w h i t e t h e n i := ! i + 1
e l s e b e g i n r := ! r - 1;
s w a p t ! r ! i end
d o n e
Specifying the Sort Function
p r e d i c a t e m o n o c h r o m e ( t : c o l o r _ a r r a y , i :int, j :int, c : c o l o r ) = f o r a l l k :int. i <= k < j - > acc ( t , k ) = c
let d u t c h _ f l a g ( t : c o l o r _ a r r a y ref) ( n :int) = { 0 <= n and l e n g t h ( t ) = n and
f o r a l l k :int. 0 <= k < n - > i s _ c o l o r ( acc ( t , k )) }
.. .
{ e x i s t s b :int. e x i s t s r :int. m o n o c h r o m e ( t , 0 , b , b l u e ) and m o n o c h r o m e ( t , b , r , w h i t e ) and m o n o c h r o m e ( t , r , n , red ) }
High-Level Datatypes Handling Arrays
Loop Invariant
w h i l e ! i < ! r do
{ i n v a r i a n t 0 <= b <= i and i <= r <= n and m o n o c h r o m e ( t , 0 , b , b l u e ) and m o n o c h r o m e ( t , b , i , w h i t e ) and m o n o c h r o m e ( t , r , n , red ) and l e n g t h ( t ) = n and
f o r a l l k :int. 0 <= k < n - >
i s _ c o l o r ( acc ( t , k )) v a r i a n t r - i }
.. .
d o n eVerification Conditions
Verification conditions generated by WP:
1
The loop invariant is true before loop entry.
2
The loop invariant is preserved and the variant decreases.
3
Precondition of swap holds when it is called.
4
The array is not accessed out of bounds.
5
The postcondition holds at function exit.
All the conditions are automatically proved by Alt-Ergo!
Note: a better specification would state that the dutch_flag function
does not modify the multiset of array elements.
High-Level Datatypes Advanced Memory Models
Proving Java Programs: Global Memory Model
In Java, each object is a reference.
Two separate variables can point to the same memory location.
Simple memory model: memory is just a single huge array.
Attribute e
1.ais modeled by M
[e1+offset(a)] with M a global memory.
Verification conditions grow too complex and are hardly provable,
because all memory accesses have to be checked for pointer equality.
Proving Java Programs: Burstall-Bornat Memory Model
In Java, if a et b are distinct class attributes,
then e
1.aand e
2.bcannot alias the same memory location.
Burstall-Bornat model: each attribute gets its own memory.
Attribute e
1.ais modeled by A[e
1]rather than M[e
1+offset(a)].
Similarly, integer arrays and object arrays can be separated.
A global table keeps track of allocated memory and dynamic types.
High-Level Datatypes Advanced Memory Models
JML (Java Modeling Language)
c l a s s P u r s e {
// @ p u b l i c i n v a r i a n t b a l a n c e >= 0;
int b a l a n c e ;
/* @ p u b l i c n o r m a l _ b e h a v i o r
@ r e q u i r e s s >= 0;
@ m o d i f i a b l e b a l a n c e ;
@ e n s u r e s b a l a n c e == \ old ( b a l a n c e ) + s ;
@ */
p u b l i c vo i d c r e d i t (int s ) { b a l a n c e += s ;
} }
Exceptional Behaviors
/* @ p u b l i c b e h a v i o r
@ r e q u i r e s s >= 0;
@ m o d i f i a b l e b a l a n c e ;
@ e n s u r e s s <= \ old ( b a l a n c e ) &&
@ b a l a n c e == \ old ( b a l a n c e ) - s ;
@ s i g n a l s ( N o C r e d i t E x c e p t i o n )
@ s > b a l a n c e && b a l a n c e == \ old ( b a l a n c e );
@ */
p u b l i c v oi d w i t h d r a w (int s ) t h r o w s N o C r e d i t E x c e p t i o n { if ( b a l a n c e >= s ) { b a l a n c e -= s ; }
e l s e { t h r o w new N o C r e d i t E x c e p t i o n (); } }
High-Level Datatypes Advanced Memory Models
Translating Java to Why
Modeling program and specification:
declarations and axioms backed by a Coq theory,
primitives describing the basic constructions of the language.
Exceptions are used for describing control flow, e.g.
break,return.Coq Theory
I n d u c t i v e v a l u e : T y p e :=
| N u l l : v a l u e | Ref : a d r O b j e c t - > v a l u e .
I n d u c t i v e tag : Ty p e :=
| Obj : c l a s s I d - > tag
| Arr : Z - > A r r k i n d - > tag .
D e f i n i t i o n a l l o c _ t a b l e := p m a p . t a d r O b j e c t tag . D e f i n i t i o n m e m o r y A := map . t v a l u e A .
D e f i n i t i o n m o d _ l o c := v a l u e - > P r o p .
D e f i n i t i o n m o d i f i a b l e A ( h : s t o r e ) ( m m ’: m e m o r y A ) ( u n c h a n g e d : m o d _ l o c ) : P r o p :=
f o r a l l v : value , a l i v e h v - > u n c h a n g e d v - >
acc m v = acc m ’ v .
High-Level Datatypes Advanced Memory Models
Why Theory: Declarations
t y p e v a l u e t y p e tag t y p e c l a s s I d t y p e j a v a T y p e t y p e a l l o c _ t a b l e t y p e ’a m e m o r y
l o g i c acc : ’a memory , v a l u e - > ’a
l o g i c Obj : c l a s s I d - > tag
l o g i c t y p e o f : a l l o c _ t a b l e , value , j a v a T y p e - > p r o p l o g i c N u l l : - > v a l u e
Why Theory: Language Primitives
e x c e p t i o n E x c e p t i o n of v a l u e e x c e p t i o n R e t u r n _ v a l u e of v a l u e
p a r a m e t e r a l l o c : a l l o c _ t a b l e ref p a r a m e t e r i n t A : int m e m a r r a y ref p a r a m e t e r o b j A : v a l u e m e m a r r a y ref p a r a m e t e r n e w _ r e f : a l l o c _ t a b l e - > v a l u e p a r a m e t e r a l l o c a t e
: a l l o c _ t a b l e - > v a l u e - > tag - > a l l o c _ t a b l e
p a r a m e t e r a l l o c _ n e w _ o b j : c : c l a s s I d - >
{ } v a l u e w r i t e s a l l o c { r e s u l t < > N u l l
and f r e s h ( alloc@ , r e s u l t)
and t y p e o f ( alloc , result, C l a s s T y p e ( c )) and s t o r e _ e x t e n d s ( alloc@ , a l l o c ) }
High-Level Datatypes Advanced Memory Models
Why Architecture
Java+JML
Annotated Programs
Annotated C
Krakatoa Frama-C
Why
provers
Coq, PVS, Isabelle. . . Alt-Ergo, Gappa, Simplify, CVC3, haRVey, SMT provers (Yices, Z3. . . )
Modeling C Code
Structure fields act as Java attributes.
Less type constraints in the language (pointer arithmetic, casts, etc).
Either a lower-level memory model or improved static analyses.
ACSL specification language.
Handling
gotorequires assigning invariants to arbitrary program point.
Some verified examples:
Schorr-Waite marking (pointer manipulation inside GC), Safety of some Dassault code (70kloc, 116k conditions, 96% automatically proved),
Minix 3 string library, Verisec buffer overflow benchmark.