Credible Compilation *

(1)

Credible Compilation

MartinRinard

Laboratory forComputerScience

Massachusetts InstituteofTechnology

Cambridge,MA 02139

Abstract

Thispaperpresentsanapproachtocompilercorrectnessin

whichthecompiler generatesaproofthatthe transformed

programcorrectlyimplementstheinputprogram.Asimple

proof checkercan then verify that the program was

com-piledcorrectly.Wecallacompilerthatproducessuchproofs

acrediblecompiler,because it producesveriableevidence

thatitisoperatingcorrectly.

1 Introduction

Today, compilers are black boxes. The programmer gives

thecompileraprogram,andthecompilerspitsoutabunch

ofbits. Untilheorsheruns theprogram,the programmer

hasnoidea if thecompiler hascompiled theprogram

cor-rectly. Evenrunning the programoers noguarantees |

compilererrorsmayshowuponlyforcertaininputs.Sothe

programmermustsimplytrustthecompiler.

Weproposea fundamental shift intherelationship

be-tween the compiler and the programmer. Every time the

compilertransformsthe program,it generatesaproofthat

the transformedprogram produces the same result as the

originalprogram. Whenthecompilernishes,the

program-mercanuse asimpleproofcheckerto verifythat the

pro-gramwascompiledcorrectly. Wecallacompiler that

gen-eratestheseproofs acrediblecompiler,because itproduces

veriableevidencethat itisoperatingcorrectly.

Figure1: TraditionalCompilation

Figure2: CredibleCompilation

This is an abridged version of the

technicalreportMIT-LCS-TR-776with thesametitle,availableat

www.cag.lcs.mit.edu/rinard/techreport/credibleCompilation.ps.

Figures 1and 2graphically illustratethe dierence

be-tweentraditionalcompilationand credible compilation. A

traditionalcompilergeneratesacompiledprogramand

noth-ing else. A crediblecompiler,onthe otherhand,also

gen-erates a proof that the compiledprogram correctly

imple-mentstheoriginalprogram. Aproofcheckercanthentake

theoriginalprogram,theproof,andthecompiledprogram,

and check if the proof is correct. Ifso, the compilation is

veriedandthecompiledprogramisguaranteedtocorrectly

implementtheoriginalprogram. Iftheproofdoesnotcheck,

thecompilationisnotveriedandallbetsareo.

2 Example

Inthis section we present anexamplethatexplainshowa

crediblecompilercanprovethatitperformedatranslation

correctly. Figure 3 presents the example program

repre-sentedasacontrol owgraph. Theprogramcontainsseveral

assignmentnodes;forexamplethenode5:i i+x+y at

label5assignsthevalueoftheexpressioni+x+ytothe

vari-ablei. Thereisalsoaconditionalbranchnode4:bri<24 .

Control owsfromthisnodethroughitsoutgoingleftedge

totheassignmentnodeatlabel5ifi<24,otherwisecontrol

owsthroughtherightedgetotheexitnodeatlabel7.

Figure 3: Original

Pro-gram

Figure 4: Program After

Constant Propagationand

ConstantFolding

Figure4presentstheprogramafterconstantpropagation

and constant folding. Thecompiler has replacedthe node

5:i i+x+y atlabel5withthenode5:i i+3. The

goal isto prove thatthis particulartransformation onthis

particular program preserves the semanticsof the original

program. The goal is not to prove that the compiler will

(2)

1andyisalways2attheprogrampointbeforenode

5. So,x+yisalways3atthisprogrampoint.

Transformation: Thecompilerusedtheanalysis

in-formationtotransformtheprogramsothatgenerates

thesameresultwhile(hopefully)executinginlesstime

orspaceorconsuminglesspower. Inourexample,the

compilersimpliestheexpressionx+yto3.

Ourapproachtoprovingoptimizationscorrectsupports

thisbasictwo-stepstructure. Thecompilerrstprovesthat

theanalysisiscorrect,thenusestheanalysisresultstoprove

that the original and transformed programs generate the

sameresult. Hereishowthisapproachworksinour

exam-ple.

2.1 ProvingAnalysisResultsCorrect

Manyyearsago,Floydcameupwithatechniqueforproving

propertiesof programs[4]. This techniquewasgeneralized

and extended,and eventuallycameto beunderstood asa

logicwhoseproofrulesarederivedfromthestructureofthe

program[2 ]. Thebasicidea isto asserta setofproperties

abouttherelationshipsbetweenvariablesatdierentpoints

intheprogram,thenusethelogictoprovethatthe

proper-tiesalwayshold. Ifso,eachpropertyiscalledaninvariant,

because it is always truewhenthe owof control reaches

thecorrespondingpointintheprogram.

In our example, the key invariant is that at the point

justbefore theprogram executes node5, it is always true

that x = 1 and y = 2. We represent this invariant as

hx=1^y=2i5.

In our example, the simplest way for the compiler to

generateaproofof hx=1^y=2 i5is for it togeneratea

setofinvariantsthatrepresenttheanalysisresults,thenuse

thelogictoprovethatalloftheinvariantshold. Hereisthe

setofinvariantsinourexample:

hx=1i3

hx=1^y=2i4

hx=1^y=2i5

hx=1^y=2i6

Conceptually, thecompiler provesthisset of invariants

bytracingexecutionpaths. Theproofisbyinductiononthe

structureofthepartialexecutionsoftheprogram.Foreach

invariant,the compiler rstassumesthatthe invariantsat

all preceding nodes inthe control owgraph are true. It

thentraces the execution througheach precedingnode to

verifythe invariant atthe nextnode. We nextpresent an

outlineoftheproofsforseveralkeyinvariants.

hx=1i3becausetheonlyprecedingnode,node2,sets

xto1.

To prove hx=1^y=2i4,rst assume hx=1 i3 and

hx=1^y=2i6. Then consider the two preceding

nodes, nodes 3 and 6. Because hx=1 i3 and 3 sets

y to 2, hx=1^y=2 i4. Because hx=1^y=2 i6

and node 6 does not aect the value of either x or

y,hx=1^y=2i4.

Inthisproof wehaveassumedthat thecompiler

gener-atesaninvariantatalmostallofthenodesintheprogram.

Moretraditionalapproaches use fewerinvariants, typically

oneinvariantperloop,thenproduceproofsthattracepaths

Whenacompilertransforms aprogram,therearetypically

some externally observable eects that it must preserve.

A standard requirement,for example, is that thecompiler

mustpreservetheinput/outputrelationoftheprogram.In

ourframework,weassumethatthecompilerisoperatingon

acompilation unitsuchas procedure or method,and that

thereareexternallyobservablevariablessuchasglobal

vari-ablesor objectinstance variables. Thecompiler must

pre-servethenalvaluesofthesevariables. Allothervariables

are eitherparameters or local variables, and the compiler

is free to do whateverit wantsto with these variables so

long as it preservesthe nalvaluesof theobservable

vari-ables. Thecompilermayalsoassumethattheinitialvalues

oftheobservablevariablesandtheparametersarethesame

inbothcases.

Inourexample, theonlyrequirementis thatthe

trans-formation mustpreserve the nal value of the variable g.

Thecompiler provesthis property by provingasimulation

correspondence betweenthe original and transformed

pro-grams. Topresentthecorrespondence,wemustbeable to

refer,inthesamecontext,tovariablesandnodelabelsfrom

the twoprograms. We adoptthe convention thatall

enti-tiesfromtheoriginalprogramP willhaveasubscriptofP,

whileallentitiesfromthetransformedprogramT willhave

asubscriptofT. SoiP referstothevariableiintheoriginal

program,whileiT referstothevariableiinthetransformed

program.

Inourexample,thecompilerprovesthatthetransformed

program simulates the original program in the following

sense: for every execution of the original program P that

reaches the nalnode 7

P

,thereexists anexecution of the

transformedprogramT thatreachesthenalnode7

T such thatg P at7 P =g T at7 T

. Wecallsuchacorrespondencea

simulationinvariant,andwriteitashg

P i7 P hg T i7 T .

Thecompiler typicallygeneratesa setofsimulation

in-variants,thenusesthelogictoconstructaproofofthe

cor-rectnessofallofthesimulationinvariants. Theproofisby

inductiononthelengthofthepartialexecutionsofthe

orig-inal program. We next outline how the compiler can use

thisapproachtoprovehgPi7PhgTi7T. First,thecompiler

isgiventhathgPi1PhgTi1T |inotherwords,thevalues

ofgP andgT arethesameatthestartofthetwoprograms.

Thecompilerthengeneratesthefollowingsimulation

invari-ants: h(g P ;i P )i2 P h(g T ;i T )i2 T h(g P ;i P )i3 P h(g T ;i T )i3 T h(gP;iP)i4Ph(gT;iT)i4T h(gP;iP)i5Ph(gT;iT)i5T h(g P ;i P )i6 P h(g T ;i T )i6 T hgPi7P hgTi7T

ThekeysimulationinvariantsarehgPi7PhgTi7T,

h(gP;iP)i6Ph(gT;iT)i6T andh(gP;iP)i4Ph(gT;iT)i4T.

Wenextoutlinetheproofs ofthesetwoinvariants.

Toprovehg P i7 P hg T i7 T ,rstassumeh(g P ;i P )i4 P h(g T ;i T )i4 T

. Foreachpathto7

P

inP,wemustnd

acorrespondingpathinT to7

P

suchthatthevalues

of g

P and g

T

are the same inbothpaths. Theonly

pathto 7P goes from 4P to 7P when iP 24. The

(3)

P P P T T T

from4

T to7

T

whenevercontrol owsfrom4

P to7

P .

The simulation invariant h(g

P ;i P )i4 P h(g T ;i T )i4 T

alsoimpliesthatthevaluesofg

P andg

T

arethesame

inbothcases.

Toproveh(gP;iP)i6Ph(gT;iT)i6T,assume

h(gP;iP)i5Ph(gT;iT)i5T. Theonlypathto6P goes

from 5P to 6P, with iP at 6P = iP at 5P +xP at

5P+yP at5P. TheanalysisproofsshowedthatxP at

5P +yP at5P =3,soiP at6P =iP at5P+3. The

correspondingpathinT goesfrom 5T to6T,withiT

at6T =iT at5T +3.

Theassumedsimulationinvariant

h(g P ;i P )i5 P h(g T ;i T )i5 T

allows us verify a

corre-spondence betweenthe values ofi

P at 6 P and i T at 6 P

;namelythattheyareequal. Because5

P doesnot change g P and 5 T

doesnotchange g

T ,g P at 6 P and g T at6 P

havethesamevalue.

Toproveh(gP;iP)i4Ph(gT;iT)i4T,rstassume

h(gP;iP)i3Ph(gT;iT)i3T and

h(gP;iP)i6P h(gT;iT)i6T. There are two paths to

4P:

{ Control ows from 3

P to 4

P

. The

correspond-ingpathinT isfrom3

T to4

T

,sowecanapply

theassumed simulation invariant h(g

P ;i P )i3 P

h(gT;iT)i3T toderive gP at 4P =gT at 4T and

iP at4P =iT at4T.

{ Control ows from 6

P to 4 P , with g P at 4 P = 2i P at 6 P

. The corresponding path in T is

from 6 T to 4 T , with g T at 4 T = 2i T at 6 T .

We can apply the assumed simulation invariant

h(g P ;i P )i6 P h(g T ;i T )i6 T toderive2i P at6 P =2i T at6 T . Since6 P

doesnotchangei

P and

6T doesnotchangeiT,wecanderivegP at4P =

gT at4T andiP at4P =iT at4T.

2.3 FormalFoundations

Inthissectionweoutlinetheformalfoundationsrequiredto

supportcredible compilation. The full paperpresentsthe

formalfoundationsindetail[11 ].

Crediblecompilationdependsonmachine-checkableproofs.

Itsusethereforerequirestheformalizationofseveral

compo-nents.First,wemustformalizetheprogramrepresentation,

namelycontrol- owgraphs,anddeneaformaloperational

semanticsforthatrepresentation. Tosupportproofsof

stan-dardinvariants,we mustformalize the Floyd-Hoare proof

rulesrequiredtoproveanyprogramproperty. Wemustalso

formalizetheproofrules forsimulationinvariants. Bothof

theseformalizationsdependonsomefacilityfordoingsimple

automatedproofs ofstandardfacts involving integer

prop-erties. It is alsonecessaryto showthat bothsets of proof

rulesaresound.

We have successfully performed these tasks for a

sim-plelanguage basedon control- ow graphs; see the full

re-portfor the completeelaboration [11 ]. Given this formal

foundation,thecompilercanusetheproofrulestoproduce

machine-checkable proofs of the correctness ofits analyses

andtransformations.

3 OptimizationSchemas

Wenextpresent examplesthatillustratehowtoprovethe

compilerwouldthenusetheschematoproduceacorrectness

proofthatgoesalongwitheachoptimization.

3.1 DeadAssignmentElimination

Thecompilercaneliminateanassignmenttoalocalvariable

ifthatvariableisnotusedaftertheassignment. Theproof

schema is relatively simple: the compiler simplygenerates

simluationinvariantsthatasserttheequalityof

correspond-ing live variables at corresponding points inthe program.

Figures5and6presentanexamplethatweusetoillustrate

theschema. Thisexamplecontinuestheexampleintroduced

inSection2. Figure7presentstheinvariantsthatthe

com-pilergeneratesforthisexample.

Figure 5: Program P

Before Dead Assignment

Elimination

Figure6:ProgramT After

DeadAssignment

Elimina-tion

I = fh(gP;iP)i4P h(gT;iT)i4T;hiPi5P hiTi5T;

hi P i6 P hi T i6 T ;hg P i7 P hg T i7 T g

Figure7: InvariantsforDeadAssignmentElimination

Note that the set I of invariants contains no standard

invariants. Ingeneral,deadassignmenteliminationrequires

only simulation invariants. The proofs of these invariants

are simple; the only complicationis the need to skip over

deadassignments.

3.2 BranchMovement

Ournextoptimizationmovesaconditionalbranchfromthe

top of a loop to the bottom. The optimization is legal if

theloopalwaysexecutesatleastonce. Thisoptimizationis

dierentfromalltheotheroptimizationswehavediscussed

sofar inthatitchangesthecontrol ow. Figure8presents

theprogrambeforebranchmovement;Figure9presentsthe

programafterbranchmovement. Figure10presentstheset

ofinvariantsthatthecompilergeneratesforthisexample.

One of the paths that the proof must consider is the

pathin theoriginal programP from 1

P to 4 P to7 P . No

execution of P, of course, will take this path | the loop

alwaysexecutes atleastonce,andthispathcorrespondsto

theloopexecutingzero times. Thefact thatthis pathwill

neverexecute shows up as a false conditionin the partial

simulationinvariantfor P thatispropagatedfrom7P back

to 1P. The correspondingpathinT thatis used toprove

I `hgPi7P hgTi7T is thepathfrom 1T through5T,6T,

(4)

foreBranchMovement BranchMovement

I=fh iPi5PhiTi5T;hiPi6PhiTi6T;hgPi7P hgTi7Tg

Figure10: InvariantsforBranchMovement

3.3 InductionVariableElimination

Our nextoptimization eliminates the induction variable i

fromthe loop, replacing itwithg. Thecorrectness ofthis

transformation depends on the invariant hgP =2iPi4P.

Figure 11 presents the program before induction variable

elimination;Figure12presentstheprogramafterinduction

variable elimination. Figure 13 presents the set of

invari-ants that the compiler generates for this example. These

invariants characterize the relationshipbetween the

elimi-natedinductionvariablei

P

fromthe originalprogramand

thevariableg

T

inthetransformedprogram.

Figure 11: Program P

Before Induction Variable

Elimination

Figure 12: Program T

After Induction Variable

Elimination

I = fhgP =2iPi4P;h2iPi5P hgTi5T;

h2i P i4 P hg T i4 T ;hg P i7 P hg T i7 T g

Figure13: Invariantsfor InductionVariableElimination

3.4 LoopUnrolling

The next optimization unrolls the loop once. Figure 14

presentstheprogrambeforeloopunrolling;Figure15presents

theprogramafterunrollingtheloop. Notethattheloop

un-rollingtransformationpreservestheloopexittest;thistest

canbeeliminatedbythedeadcodeeliminationoptimization

discussedinSection3.5.

Figure14: ProgramP

Be-foreLoopUnrolling

Figure 15: Program T

Af-terLoopUnrolling

I = fhgP%12=0_gP%12=6 i4P;hgP%12=0;gPi5P hg T i2 T ;hg P %12=6;g P i4 P hg T i3 T ; hg P %12=6;g P i5 P hg T i5 T ;

hgP%12=0;gPi4PhgTi4T;hgPi7P hgPi7Pg

Figure16: InvariantsforLoopUnrolling

Figure16presentsthesetofinvariantsthatthecompiler

generatesforthisexample. Notethat,unlikethesimulation

invariantsinpreviousexamples,thesesimulationinvariants

have conditions. The conditions are used to separate

dif-ferentexecutionsofthesamenodeintheoriginalprogram.

Someofthetime,theexecutionatnode4P correspondsto

the execution at node 4T, and other times to the

execu-tionatnode3T. Theconditionsinthesimulationinvariants

identifywhen,intheexecutionoftheoriginalprogram,each

correspondenceholds. Forexample, whengP%12=0, the

execution at 4P correspondsto theexecution at 4T;when

gP%12=6,the executionat4P correspondstothe

execu-tionat3T.

3.5 DeadCode Elimination

Our nextoptimization is dead code elimination. We

con-tinuewithourexamplebyeliminatingthebranchinthe

mid-dleoftheloopatnode3. Figure17presentstheprogram

be-forethebranchiseliminated. Thekeypropertythatallows

thecompilertoremovethebranchisthatg%12=6^g48

at3,whichimpliesthatg<48at3. Inotherwords,the

con-dition inthebranchisalwaystrue. Figure18presentsthe

programafterthebranchiseliminated. Figure19presents

thesetofinvariantsthatthecompilergeneratesforthis

ex-ample.

One of the paths that the proof must consider is the

potentialloopexitintheoriginalprogramP from3

P to7

P ;

In fact, the loopalways exits from 4

P , not3

P

. This fact

showsupbecausetheconjunctionofthestandardinvariant

hg P %12=6^g P 48i3 P

withtheconditiong

P

48from

the partial simulation invariant for P at 3

P

is false. The

correspondingpathinT thatisusedtoproveI`hi

P i7 P hi T i7 T

isthepathfrom5

T to4 T to7 T . 4 CodeGeneration

Inprinciple,webelievethatitispossibletoproduceaproof

(5)

Figure17: Program P

Be-fore Dead Code

Elimina-tion

Figure18: ProgramT

Af-terDeadCodeElimination

I = fh gP%12=0^gP <48 i2P; hg P %12=6^g P 48 i3 P ; hgP%12=6^gP <48 i5P; hg P %12=0^g P 48 i4 P ;hg P i2 P hg P i2 P ; hgPi5PhgPi5P;

hgPi3PhgPi5P;hgPi4P hgPi4P;

hg P i7 P hg P i7 P g

Figure19: InvariantsforDeadCodeElimination

proofsystem toworkwithastandardintermediateformat

basedoncontrol owgraphs. Theparser,whichproduces

theinitialcontrol owgraph,andthecodegenerator,which

generatesobjectcodefromthenalcontrol owgraph,are

therefore potentialsources of uncaught errors. Webelieve

it should be straightforward, for reasonable languages, to

produce a standard parser that is not a serious source of

errors. It isnot soobvious howthe codegeneratorcanbe

madesimpleenoughtobereliable.

Our goal is make the step from the nal control ow

graphtothegeneratedcodebeassmallaspossible. Ideally,

each node in the control ow graph would correspond to

asingleinstruction inthegenerated code. Toachievethis

goal,itmustbepossibletoexpresstheresultofcomplicated,

machine-speciccodegenerationalgorithms(suchas

regis-ter allocation and instruction selection) usingcontrol ow

graphs. Afterthecompilerapplies thesealgorithms,the

-nalcontrol owgraphwouldbestructuredinastylizedway

appropriateforthetargetarchitecture. Thecodegenerator

forthetargetarchitecturewouldacceptsuchacontrol ow

graph as input and use a simple translation algorithm to

producethenalobjectcode.

Withthis approach,weanticipatethat codegenerators

canbemadeapproximatelyassimpleasproofcheckers. We

thereforeanticipatethatitwillbepossibletobuildstandard

code generators with an acceptable level of reliability for

mostusers.However,wewouldonceagainliketoemphasize

thatitshouldbepossibletobuildaframeworkinwhichthe

compilationischeckedfromsourcecodetoobjectcode.

In the following two sections, we rst present an

ap-proach for asimple RISC instruction set, then discuss an

approachformorecomplicatedinstructionsets.

For a simple RISC instruction set, the key idea is to

in-troducespecialvariablesthatthecodegeneratorinterprets

asregisters. Thecontrol owgraphisthentransformedso

that each node corresponds to a single instruction in the

generatedcode.Werstconsiderassignmentnodes.

If the destination variable is a register variable, the

sourceexpressionmustbeoneofthefollowing:

{ Anon-registervariable. Inthiscasethenode

cor-respondstoaloadinstruction.

{ A constant. Inthiscasethenodecorrespondsto

aloadimmediateinstruction.

{ A single arithmeticoperation withregister

vari-able operands. Inthiscasethenodecorresponds

toanarithmeticinstructionthatoperatesonthe

two source registers to produce a value that is

writtenintothedestinationregister.

{ A single arithmetic operation with one register

variable operandand oneconstant operand. In

this case the nodecorrespondsto an arithmetic

instruction that operates on onesource register

and an immediate constant to produce a value

thatiswrittenintothedestinationregister.

Ifthedestinationvariableof anassignment nodeisa

non-registervariable,thesourceexpressionmust

con-sistofaregistervariable,andthenodecorrespondsto

astoreinstruction.

It is possible to convert assignment nodes with arbitrary

expressions to this form. The rst step is to atten the

expression by introducing temporary variables to hold the

intermediatevaluescomputedbytheexpression. Additional

assignmentnodestransferthesevaluestothenewtemporary

variables. The second step is to use a register allocation

algorithmtotransformthecontrol owgraphtottheform

describedabove.

Wenextconsider conditionalbranchnodes. Ifthe

con-ditionistheconstanttrueorfalse,thenodecorrespondsto

anunconditionalbranchinstruction. Otherwise, the

condi-tionmustcomparearegister variablewithzero sothatthe

instructioncorrespondseithertoabranchifzeroinstruction

orabranchifnotzeroinstruction.

4.2 MoreComplexInstructionSets

Many processors oer more complex instructions that, in

eect,domultiple thingsinasinglecycle. IntheARM

in-structionset,forexample,theexecutionofeachinstruction

maybepredicatedonseveralconditioncodes. ARM

instruc-tionscanthereforebemodeledasconsistingofaconditional

branchplustheotheroperationsintheinstruction. Thex86

instructionsethasinstructionsthatassignvaluestoseveral

registers.

Webelievethecorrectapproachforthesemorecomplex

instructionsetsistoletthecompilerwriterextendthe

possi-bletypesofnodesinthecontrol owgraph. Thesemantics

of each new type of node would be given interms of the

base nodes in standard control ow graphs. We illustrate

thisapproachwithanexample.

For instructionsets withconditioncodes,the

program-merwoulddeneanewvariableforeachconditioncodeand

(6)

appro-ment,testedtheappropriateconditions,andsetthe

appro-priate condition code variables. If theinstruction set also

haspredicatedexecution,thecontrol owgraphwoulduse

conditionalbranchnodestochecktheappropriatecondition

codesbeforeperformingtheinstruction.

Eachnewtypeofnodewouldcomewithproofrules

au-tomaticallyderivedfromitsunderlying control owgraph.

Theproof checkercould therefore verifyproofs oncontrol

ow graphs that include these types of nodes. The code

generator would require the precedingphases of the

com-piler to produce a control ow graph that contained only

thosetypesofnodesthattranslatedirectlyintoasingle

in-structiononthetargetarchitecture.Withthisapproach,all

complexcodegeneration algorithmscould operateon

con-trol owgraphs,withtheirresultscheckedforcorrectness.

5 RelatedWork

Mostexisting researchoncompilercorrectness hasfocused

ontechniquesthatdeliveracompilerguaranteedtooperate

correctlyoneveryinputprogram[5 ];wecallsuchacompiler

atotallycorrectcompiler. Acrediblecompiler,ontheother

hand,isnotnecessarilyguaranteed tooperatecorrectlyon

allprograms|itmerelyproducesaproofthatithas

oper-atedcorrectlyonthecurrentprogram.

Intheabsenceofotherdierences,onewouldclearly

pre-feratotally correct compilerto acrediblecompiler. After

all,thecrediblecompilermayfailtocompilesomeprograms

correctly,whilethetotallycorrectcompilerwillalwayswork.

Butthetotally correctcompilerapproachimposesa

signif-icant pragmatic drawback: it requires the source code of

thecompiler, ratherthanitsoutput,to beprovedcorrect.

Soprogrammers mustexpress the compiler in a way that

is amenable to these correctness proofs. In practice this

invasive constraint hasrestrictedthe compilerto alimited

setofsourcelanguages andcompileralgorithms. Although

the concept of atotally correct compiler has been around

formanyyears,thereare,toourknowledge,nototally

cor-rectcompilersthatproduceclosetoproduction-qualitycode

for realisticprogramminglanguages. Credible compilation

oersthecompilerdevelopermuchmorefreedom. The

com-pilercanbedevelopedinanylanguageusingany

methodol-ogyand performarbitrarytransformations. Theonly

con-straint isthat thecompiler produceaproof thatitsresult

iscorrect.

Theconceptofcrediblecompilershasalsoarisen inthe

contextofcompilingsynchronouslanguages[3,8 ]. Our

ap-proach,whilephilosophicallysimilar,istechnicallymuch

dif-ferent. Itisdesignedforstandardimperativelanguagesand

therefore uses drastically dierent techniques for deriving

andexpressingthecorrectnessproofs.

Weoftenareaskedthequestion\Howisyourapproach

dierentfromproof-carryingcode[6 ]?"

Inourview,

cred-iblecompilers andproof-carrying code areorthogonal

con-cepts.Proof-carryingcodeisusedtoprovepropertiesofone

program,typicallythecompiledprogram.Credible

compil-ers establish a correspondence between two programs: an

originalprogramandacompiledprogram. Givenasafe

pro-gramminglanguage,acrediblecompilerwill produce

guar-anteesthatarestrongerthanthoseprovidedbytypical

ap-plications of proof-carrying code. So, for example, if the

Proof-carryingcodeiscodeaugmentedwithaproofthatthecode

satisessafetypropertiessuchastypesafetyortheabsenceofarray

boundsviolations.

duces a proof that the compiled program correctly

imple-mentsthe original program,then thecompiledprogram is

alsotypesafe.

But proof-carrying code can, in principle, be used to

provepropertiesthatarenotvisibleinthesemanticsofthe

language. For example, onemight use proof-carrying code

toprovethataprogramdoesnotexecuteasequenceof

in-structions that may damage the hardware. Because most

languages simply do not deal with the kinds of concepts

requiredto provesuchaproperty asa correspondence

be-tweentwoprograms,crediblecompilationisnotparticularly

relevanttothesekindsofproblems.

Sincewerstwrotethispaper,crediblecompilationhas

developedsignicantly. Forexample,ithasbeenappliedto

programswithpointers[10 ]andtoCprograms[7 ].

6 ImpactofCredibleCompilation

We next discussthe potential impact of credible

compila-tion. Weconsiderveareas: debuggingcompilers,

increas-ingthe exibilityofcompilerdevelopment,just-in-time

com-pilers,conceptofanopencompiler,andtherelationshipof

crediblecompilationtobuildingcustomcompilers.

6.1 DebuggingCompilers

CompilersarenotoriouslydiÆculttobuildanddebug. Ina

large compiler, a surprising partof the diÆcultyis simply

recognizingincorrectlygeneratedcode.Thecurrentstateof

the art is to generatecode after aset of passes, thentest

that the generated code produces the same result as the

original code. Oncea pieceofincorrect code isfound, the

developer must spend time tracing the bug back through

layersofpassestotheoriginalsource.

Requiringthecompilertogenerateaproofforeach

trans-formationwilldramaticallysimplifythisprocess.Assoonas

apass operates incorrectly,thedeveloperwill immediately

be directed to the incorrectcode. Bugs canbe found and

eliminatedassoonastheyoccur.

6.2 FlexibleCompilerDevelopment

ItisdiÆcult,ifnotimpossible,toeliminateallofthebugsin

alargesoftware systemsuchas acompiler. Overtime,the

systemtendstostabilizearoundarelativelyreliablesoftware

baseasitis incrementallydebugged. Thepriceofthis

sta-bility isthat peoplebecomeextremely reluctantto change

thesoftware,eithertoaddfeaturesorevento xrelatively

minorbugs,forfearofinadvertantlyintroducingnewbugs.

Atsomepointthesystembecomesobsoletebecausethe

de-velopers are unable to upgrade it quicklyenoughfor it to

stay relevant.

Crediblecompilation,combinedwiththestandard

orga-nization of the compiler as asequence of passes,promises

tomakeitpossibletocontinuallyintroducenew,unreliable

codeintoamaturecompilerwithoutcompromising

function-alityor reliability. Considerthe following scenario.

Work-ingunderdeadlinepressure,acompilerdeveloperhascome

up a prototype implementation of a complex

transforma-tion. Thistransformationisofgreatinterestbecauseit

dra-maticallyimprovestheperformanceofseveralSPEC

bench-marks. But because the developer cut corners to get the

implementationoutquickly,itis unreliable. Withcredible

compilation,thisunreliabilityisnotaproblematall|the

(7)

nessproof anddiscardingtheresultsifitdidn'twork. The

compiler operates as reliablyasit did beforethe

introduc-tionofthenewpass,butwhenthepassworks,itgenerates

muchbettercode.

Itiswellknownthattheeortrequiredtomakea

com-piler work on all conceivable inputs is muchgreater than

theeortrequiredto make thecompiler workonall likely

inputs. Crediblecompilationmakesit possibletobuildthe

entirecompiler asa sequenceof passes that workonly for

commonorimportantcases. Because developerswouldbe

undernopressuretomakepassesworkonallcases,eachpass

couldbehackedtogetherquicklywithlittle testingand no

complicatedcodetohandleexceptionalcases. Theresultis

thatthecompilerwouldbemucheasierandcheapertobuild

andmucheasiertotargetforgoodperformanceonspecic

programs.

Analextrapolationistobuildspeculative

transforma-tions. Theidea isthatthecompilersimplyomitsthe

anal-ysisrequiredtodetermineifthe transformationislegal. It

doesthetransformationanywayandgeneratesaproofthat

thetransformation iscorrect. Thisproofisvalid,ofcourse,

onlyifthetransformation iscorrect. Theproof checker

l-tersoutinvalidtransformationsandkeepstherest.

Thisapproachshiftsworkfromthedevelopertotheproof

checker. The proof checkerdoes the analysis required to

determineifthe transformation islegal, and thedeveloper

canfocus onthe transformation and the proofgeneration,

notonwritingtheanalysiscode.

6.3 Just-In-Time Compilers

Theincreasednetworkinterconnectivity resultingfromthe

deploymentoftheInternethasenabledandpromotedanew

way to distribute software. Instead of compiling tonative

machinecodethatwillrunonlyononemachine,thesource

programiscompiledtoaportablebytecode.Aninterpreter

executesthebytecode.

Theproblemisthattheinterpretedbytecoderunsmuch

slowerthannativecode. Theproposedsolutionistousea

just-in-timecompiler to generate native code either when

the byte codearrivesor dynamically as it runs. Dynamic

compilationalsohastheadvantagethatitcanuse

dynami-callycollectedprolinginformationtodrivethecompilation

process.

Note,however,thatthejust-in-timecompilerisanother

complex,potentiallyerroneoussoftwarecomponentthatcan

aectthe correct execution of the program. If a compiler

generatesnativecode,theonlysubsystemsthatcanchange

thesemanticsofthenativecodebinaryduringnormal

oper-ationare theloader,dynamiclinker, operatingsystemand

hardware,allofwhicharerelativelystaticsystems.An

orga-nizationthatisshippingsoftwarecangenerateabinaryand

testitextensivelyonthekindofsystemsthatitscustomers

will use. If the customer nds an error, the organization

caninvestigate the problem by runningthe program ona

roughlyequivalentsystem.

Butwithdynamic compilation, thecompiledcode

con-stantlychangesin away that may be verydiÆcult to

re-produce. If thedynamic compiler incorrectlycompiles the

program,itmaybeextremelydiÆculttoreproducethe

con-ditionsthatcausedittofail. Thisadditionalcomplexityin

thecompilationapproachmakesitmorediÆculttobuilda

reliablecompiler. Italso makesitdiÆcultto assignblame

for any failure. When an error shows up, it could be

ei-therthecompilerortheapplication. Theorganizationsthat

and neither oneis motivatedto workhardto nd and x

theproblem. Theendresultisthat thetotalsystem stays

broken.

Crediblecompilationcaneliminate thisproblem. If the

dynamic compileremits aproofthat it executedcorrectly,

the run-time system can check the proof before accepting

thegeneratedcode.Allincorrectcodewouldbelteredout

beforeitcausedaproblem. Thisapproachrestoresthe

reli-ability propertiesofdistributingnativecodebinarieswhile

supportingtheconvenienceand exibilityofdynamic

compi-lationandthedistributionofsoftwareinportablebyte-code

format.

6.4 AnOpenCompiler

We believe that credible compilers will change the social

context in which compilers are built. Before a developer

cansafelyintegrateapassintothecompiler,theremustbe

some evidence that pass will work. But thereis currently

no way to verify the correctness of the pass. Developers

aretherefore typicallyreducedtorelyingonthereputation

of the personthat produced the pass, rather than on the

trustworthiness of the code itself. Inpractice, this means

thattheentirecompileristypicallybuiltbyasmall,cohesive

group of people ina single organization. The compiler is

closed inthe sense that thesepeoplemustcoordinateany

contributiontothecompiler.

Crediblecompilationeliminatestheneedfordevelopers

totrusteachother. Anyonecantakeanypass,integrateinto

their compiler,anduseit. Ifapassoperates incorrectly,it

is immediatelyapparent, andthe compilercandiscard the

transformation. There is no need to trust anyone. The

compilerisnowopenandanyonecancontribute. Insteadof

relyingonasmallgroupofpeopleinoneorganization, the

eort, energy, and intelligenceof every compiler developer

intheworldcanbeproductivelyappliedtothedevelopment

ofonecompiler.

Thekeystomakingthisvisionarealityareastandard

in-termediaterepresentation,logics for expressingthe proofs,

and a verier that checks the proofs. The representation

mustbe expressiveand supporttherange ofprogram

rep-resentationsrequiredforbothhighlevelandlowlevel

anal-ysesandtransformations. Ideally,therepresentationwould

be extensible, withdevelopers able toaugmentthe system

withnewconstructsandnewaxiomsthatcharacterizethese

constructs. Theverierwould beastandard pieceof

soft-ware. We expect several independent veriers to emerge

that would be used by most programmers; paranoid

pro-grammers can build their own verier. It might even be

possibletodoaformalcorrectnessproofoftheverier.

Once this standard infrastructure is in place, we can

leveragetheInternettocreateacompilerdevelopment

com-munity. Onecould imagine,for example,acompiler

devel-opment web portal with code transformation passes, front

ends,andveriers. Anyonecandownloadatransformation;

anyone canuseanyof the transformationswithout fear of

obtaininganincorrectresult. Eachdevelopercanconstruct

hisor herowncustomcompiler bystringing togethera

se-quenceofoptimizationpassesfromthiswebsite. Onecould

even imagineanintellectual property marketemerging, as

developerslicense passesorchargeelectronic cash foreach

useofapass. Infact,futurecompilersmayconsistofasetof

transformationsdistributed acrossmultiplewebsites, with

theprogram(anditscorrectnessproofs) owingthroughthe

(8)

Compilersaretraditionallythoughtofandbuiltas

general-purposesystemsthatshouldbeabletocompileanyprogram

giventothem. Asaconsequence,theytendtocontain

anal-ysesandtransformationsthatare ofgeneralutility and

al-mostalwaysapplicable. Anyextracomponentswouldslow

thecompilerdownandincreasethecomplexity.

The problem with this situation is that general

tech-niques tend to do relatively pedestrian things to the

pro-gram. For specic classes of programs, more specialized

analysesandtransformationswouldmakeahugedierence[12 ,

9 ,1]. Butbecausetheyarenotgenerallyuseful,theydon't

makeitintowidelyusedcompilers.

Webelievethatcrediblecompilationcanmakeitpossible

todeveloplotsofdierentcustomcompilersthathavebeen

specialized for specic classes of applications. The idea is

to make a set of credible passes available, then allow the

compiler builderto combine theminarbitraryways. Very

specializedpassescouldbeincludedwithoutthreateningthe

stability ofthecompiler. Onecouldeasily imaginearange

ofcompilersquicklydevelopedforeachclassofapplications.

Itwouldevenbepossibleextrapolatethisideatoinclude

optimistictransformations.Insomecases,itisdiÆculttodo

theanalysis requiredto performa specictransformation.

Inthis case, the compiler could simplyomit the analysis,

dothetransformation, andgenerateaproofthatwouldbe

correct ifthe analysis would have succeeded. Ifthe

trans-formationisincorrect,itwillbelteredoutbythecompiler

driver. Otherwise,thetransformationgoesthrough.

This example of optimistic transformations illustrates

a somewhat paradoxical property of credible compilation.

Eventhoughcrediblecompilationwillmakeitmucheasier

to develop correct compilers, it also makesit practical to

releasemuchbuggiercompilers.Infact,asdescribedbelow,

itmaychangethereliabilityexpectationsforcompilers.

Programmerscurrentlyexpectthatthecompilerwillwork

correctlyforeveryprogramthattheygiveit. Andyoucan

seethat somethingvery close to this level of reliability is

requiredif the compiler fails silentlywhen it fails | it is

verydiÆcultforprogrammerstobuildasystemifthereisa

reasonableprobabilitythat agivenerrorcanbe causedby

thecompilerandnottheapplication.

Butcredible compilationcompletely changes the

situa-tion. Iftheprogrammercandeterminewhether ornotthe

thecompileroperatedcorrectlybeforetestingtheprogram,

thedevelopmentprocesscantolerateacompilerthat

occa-sionallyfails.

Inthisscenario,thetaskofthecompilerdeveloperchanges

completely.Heorsheisnolongerresponsiblefordelivering

aprogramthat worksalmost allof the time. It isenough

todeliverasystemwhosefailuresdonotsignicantly

ham-per the development of the system. There is little need

tomakeveryuncommon casesworkcorrectly,especiallyif

thereare knownwork-arounds. Theresultis thatcompiler

developers can be much more aggressive | the length of

thedevelomentcyclewillshrinkandnewtechniqueswillbe

incorporatedintoproductioncompilersmuchmorequickly.

7 Conclusions

Mostresearchoncompilercorrectnesshasfocusedon

obtain-ing acompiler that isguaranteed to generatecorrect code

for everyinputprogram. This paperpresents aless

ambi-tious,buthopefullymuchmorepracticalapproach:require

the compiler to generatea proof that the generated code

tion,aswecallthisapproach,givesthecompiler developer

maximum exibility, helps developers nd compiler bugs,

andeliminatestheneedtotrustthedevelopersofcompiler

passes.

References

[1] S.Amarasinghe,J.Anderson,M. Lam,andC.Tseng.

The SUIF compiler for scalable parallel machines. In

ProceedingsoftheEighthSIAMConferenceonParallel

ProcessingforScienticComputing,February1995.

[2] K.AptandE.Olderog. VericationofSequential and

ConcurrentPrograms. Springer-Verlag,1997.

[3] A. Cimatti, F. Giunchiglia, P. Pecchiari, B. Pietra,

J. Profeta, D. Romano, P. Traverso, and B. Yu. A

provablycorrectembeddedverierfor thecertication

of safety critical software. In Proceedings of the 9th

International Conference onComputer Aided

Verica-tion, pages202{213,Haifa,Israel,June1997.

[4] R. Floyd. Assigning meanings to programs. In

J. Schwartz, editor, Proceedings of the Symposium in

AppliedMathematics,number19,pages19{32,1967.

[5] J. Guttman, J. Ramsdell, and M. Wand. VLISP: a

veriedimplementationofscheme. LispandSymbolic

Computing,8(1{2):33{110, March1995.

[6] G. Necula. Proof-carrying code. In Proceedings of

the24thAnnualACMSymposiumonthePrinciples of

ProgrammingLanguages,pages106{119,Paris,France,

January1997.

[7] G. Necula. Translation validation for an optimizing

compiler. InProceedingsof theSIGPLAN'00

Confer-enceonProgramLanguageDesignandImplementation,

Vancouver,Canada,June2000.

[8] A. Pnueli, M. Siegal, and E. Singerman. Translation

validation.InProceedingsofthe4thInternational

Con-ference on Tools and Algorithms for the Construction

andAnalysisofSystems,Lisbon,Portugal,March1998.

[9] M. Rinardand P. Diniz. Commutativity analysis: A

new framework for parallelizing compilers. In

Pro-ceedings of the SIGPLAN '96 Conference onProgram

Language Design and Implementation, pages 54{67,

Philadelphia,PA,May1996. ACM,NewYork.

[10] M. RinardandD.Marinov. Crediblecompilationwith

pointers.InProceedingsoftheWorkshoponRun-Time

ResultVerication, Trento,Italy,July1999.

[11] MartinRinard.Crediblecompilation.TechnicalReport

MIT-LCS-TR-776, Laboratory for Computer Science,

MassachusettsInstituteofTechnology,March1999.

[12] R. Rugina and M. Rinard. Automatic parallelization

of divide and conquer algorithms. In Proceedings of

the7thACMSIGPLANSymposiumonPrinciplesand

Practice of Parallel Programming, Atlanta, GA, May