Credible Compilation
MartinRinard
Laboratory forComputerScience
Massachusetts InstituteofTechnology
Cambridge,MA 02139
Abstract
Thispaperpresentsanapproachtocompilercorrectnessin
whichthecompiler generatesaproofthatthe transformed
programcorrectlyimplementstheinputprogram.Asimple
proof checkercan then verify that the program was
com-piledcorrectly.Wecallacompilerthatproducessuchproofs
acrediblecompiler,because it producesveriableevidence
thatitisoperatingcorrectly.
1 Introduction
Today, compilers are black boxes. The programmer gives
thecompileraprogram,andthecompilerspitsoutabunch
ofbits. Untilheorsheruns theprogram,the programmer
hasnoidea if thecompiler hascompiled theprogram
cor-rectly. Evenrunning the programoers noguarantees |
compilererrorsmayshowuponlyforcertaininputs.Sothe
programmermustsimplytrustthecompiler.
Weproposea fundamental shift intherelationship
be-tween the compiler and the programmer. Every time the
compilertransformsthe program,it generatesaproofthat
the transformedprogram produces the same result as the
originalprogram. Whenthecompilernishes,the
program-mercanuse asimpleproofcheckerto verifythat the
pro-gramwascompiledcorrectly. Wecallacompiler that
gen-eratestheseproofs acrediblecompiler,because itproduces
veriableevidencethat itisoperatingcorrectly.
Figure1: TraditionalCompilation
Figure2: CredibleCompilation
This is an abridged version of the
technicalreportMIT-LCS-TR-776with thesametitle,availableat
www.cag.lcs.mit.edu/rinard/techreport/credibleCompilation.ps.
Figures 1and 2graphically illustratethe dierence
be-tweentraditionalcompilationand credible compilation. A
traditionalcompilergeneratesacompiledprogramand
noth-ing else. A crediblecompiler,onthe otherhand,also
gen-erates a proof that the compiledprogram correctly
imple-mentstheoriginalprogram. Aproofcheckercanthentake
theoriginalprogram,theproof,andthecompiledprogram,
and check if the proof is correct. Ifso, the compilation is
veriedandthecompiledprogramisguaranteedtocorrectly
implementtheoriginalprogram. Iftheproofdoesnotcheck,
thecompilationisnotveriedandallbetsareo.
2 Example
Inthis section we present anexamplethatexplainshowa
crediblecompilercanprovethatitperformedatranslation
correctly. Figure 3 presents the example program
repre-sentedasacontrol owgraph. Theprogramcontainsseveral
assignmentnodes;forexamplethenode5:i i+x+y at
label5assignsthevalueoftheexpressioni+x+ytothe
vari-ablei. Thereisalsoaconditionalbranchnode4:bri<24 .
Control owsfromthisnodethroughitsoutgoingleftedge
totheassignmentnodeatlabel5ifi<24,otherwisecontrol
owsthroughtherightedgetotheexitnodeatlabel7.
Figure 3: Original
Pro-gram
Figure 4: Program After
Constant Propagationand
ConstantFolding
Figure4presentstheprogramafterconstantpropagation
and constant folding. Thecompiler has replacedthe node
5:i i+x+y atlabel5withthenode5:i i+3. The
goal isto prove thatthis particulartransformation onthis
particular program preserves the semanticsof the original
program. The goal is not to prove that the compiler will
1andyisalways2attheprogrampointbeforenode
5. So,x+yisalways3atthisprogrampoint.
Transformation: Thecompilerusedtheanalysis
in-formationtotransformtheprogramsothatgenerates
thesameresultwhile(hopefully)executinginlesstime
orspaceorconsuminglesspower. Inourexample,the
compilersimpliestheexpressionx+yto3.
Ourapproachtoprovingoptimizationscorrectsupports
thisbasictwo-stepstructure. Thecompilerrstprovesthat
theanalysisiscorrect,thenusestheanalysisresultstoprove
that the original and transformed programs generate the
sameresult. Hereishowthisapproachworksinour
exam-ple.
2.1 ProvingAnalysisResultsCorrect
Manyyearsago,Floydcameupwithatechniqueforproving
propertiesof programs[4]. This techniquewasgeneralized
and extended,and eventuallycameto beunderstood asa
logicwhoseproofrulesarederivedfromthestructureofthe
program[2 ]. Thebasicidea isto asserta setofproperties
abouttherelationshipsbetweenvariablesatdierentpoints
intheprogram,thenusethelogictoprovethatthe
proper-tiesalwayshold. Ifso,eachpropertyiscalledaninvariant,
because it is always truewhenthe owof control reaches
thecorrespondingpointintheprogram.
In our example, the key invariant is that at the point
justbefore theprogram executes node5, it is always true
that x = 1 and y = 2. We represent this invariant as
hx=1^y=2i5.
In our example, the simplest way for the compiler to
generateaproofof hx=1^y=2 i5is for it togeneratea
setofinvariantsthatrepresenttheanalysisresults,thenuse
thelogictoprovethatalloftheinvariantshold. Hereisthe
setofinvariantsinourexample:
hx=1i3
hx=1^y=2i4
hx=1^y=2i5
hx=1^y=2i6
Conceptually, thecompiler provesthisset of invariants
bytracingexecutionpaths. Theproofisbyinductiononthe
structureofthepartialexecutionsoftheprogram.Foreach
invariant,the compiler rstassumesthatthe invariantsat
all preceding nodes inthe control owgraph are true. It
thentraces the execution througheach precedingnode to
verifythe invariant atthe nextnode. We nextpresent an
outlineoftheproofsforseveralkeyinvariants.
hx=1i3becausetheonlyprecedingnode,node2,sets
xto1.
To prove hx=1^y=2i4,rst assume hx=1 i3 and
hx=1^y=2i6. Then consider the two preceding
nodes, nodes 3 and 6. Because hx=1 i3 and 3 sets
y to 2, hx=1^y=2 i4. Because hx=1^y=2 i6
and node 6 does not aect the value of either x or
y,hx=1^y=2i4.
Inthisproof wehaveassumedthat thecompiler
gener-atesaninvariantatalmostallofthenodesintheprogram.
Moretraditionalapproaches use fewerinvariants, typically
oneinvariantperloop,thenproduceproofsthattracepaths
Whenacompilertransforms aprogram,therearetypically
some externally observable eects that it must preserve.
A standard requirement,for example, is that thecompiler
mustpreservetheinput/outputrelationoftheprogram.In
ourframework,weassumethatthecompilerisoperatingon
acompilation unitsuchas procedure or method,and that
thereareexternallyobservablevariablessuchasglobal
vari-ablesor objectinstance variables. Thecompiler must
pre-servethenalvaluesofthesevariables. Allothervariables
are eitherparameters or local variables, and the compiler
is free to do whateverit wantsto with these variables so
long as it preservesthe nalvaluesof theobservable
vari-ables. Thecompilermayalsoassumethattheinitialvalues
oftheobservablevariablesandtheparametersarethesame
inbothcases.
Inourexample, theonlyrequirementis thatthe
trans-formation mustpreserve the nal value of the variable g.
Thecompiler provesthis property by provingasimulation
correspondence betweenthe original and transformed
pro-grams. Topresentthecorrespondence,wemustbeable to
refer,inthesamecontext,tovariablesandnodelabelsfrom
the twoprograms. We adoptthe convention thatall
enti-tiesfromtheoriginalprogramP willhaveasubscriptofP,
whileallentitiesfromthetransformedprogramT willhave
asubscriptofT. SoiP referstothevariableiintheoriginal
program,whileiT referstothevariableiinthetransformed
program.
Inourexample,thecompilerprovesthatthetransformed
program simulates the original program in the following
sense: for every execution of the original program P that
reaches the nalnode 7
P
,thereexists anexecution of the
transformedprogramT thatreachesthenalnode7
T such thatg P at7 P =g T at7 T
. Wecallsuchacorrespondencea
simulationinvariant,andwriteitashg
P i7 P hg T i7 T .
Thecompiler typicallygeneratesa setofsimulation
in-variants,thenusesthelogictoconstructaproofofthe
cor-rectnessofallofthesimulationinvariants. Theproofisby
inductiononthelengthofthepartialexecutionsofthe
orig-inal program. We next outline how the compiler can use
thisapproachtoprovehgPi7PhgTi7T. First,thecompiler
isgiventhathgPi1PhgTi1T |inotherwords,thevalues
ofgP andgT arethesameatthestartofthetwoprograms.
Thecompilerthengeneratesthefollowingsimulation
invari-ants: h(g P ;i P )i2 P h(g T ;i T )i2 T h(g P ;i P )i3 P h(g T ;i T )i3 T h(gP;iP)i4Ph(gT;iT)i4T h(gP;iP)i5Ph(gT;iT)i5T h(g P ;i P )i6 P h(g T ;i T )i6 T hgPi7P hgTi7T
ThekeysimulationinvariantsarehgPi7PhgTi7T,
h(gP;iP)i6Ph(gT;iT)i6T andh(gP;iP)i4Ph(gT;iT)i4T.
Wenextoutlinetheproofs ofthesetwoinvariants.
Toprovehg P i7 P hg T i7 T ,rstassumeh(g P ;i P )i4 P h(g T ;i T )i4 T
. Foreachpathto7
P
inP,wemustnd
acorrespondingpathinT to7
P
suchthatthevalues
of g
P and g
T
are the same inbothpaths. Theonly
pathto 7P goes from 4P to 7P when iP 24. The
P P P T T T
from4
T to7
T
whenevercontrol owsfrom4
P to7
P .
The simulation invariant h(g
P ;i P )i4 P h(g T ;i T )i4 T
alsoimpliesthatthevaluesofg
P andg
T
arethesame
inbothcases.
Toproveh(gP;iP)i6Ph(gT;iT)i6T,assume
h(gP;iP)i5Ph(gT;iT)i5T. Theonlypathto6P goes
from 5P to 6P, with iP at 6P = iP at 5P +xP at
5P+yP at5P. TheanalysisproofsshowedthatxP at
5P +yP at5P =3,soiP at6P =iP at5P+3. The
correspondingpathinT goesfrom 5T to6T,withiT
at6T =iT at5T +3.
Theassumedsimulationinvariant
h(g P ;i P )i5 P h(g T ;i T )i5 T
allows us verify a
corre-spondence betweenthe values ofi
P at 6 P and i T at 6 P
;namelythattheyareequal. Because5
P doesnot change g P and 5 T
doesnotchange g
T ,g P at 6 P and g T at6 P
havethesamevalue.
Toproveh(gP;iP)i4Ph(gT;iT)i4T,rstassume
h(gP;iP)i3Ph(gT;iT)i3T and
h(gP;iP)i6P h(gT;iT)i6T. There are two paths to
4P:
{ Control ows from 3
P to 4
P
. The
correspond-ingpathinT isfrom3
T to4
T
,sowecanapply
theassumed simulation invariant h(g
P ;i P )i3 P
h(gT;iT)i3T toderive gP at 4P =gT at 4T and
iP at4P =iT at4T.
{ Control ows from 6
P to 4 P , with g P at 4 P = 2i P at 6 P
. The corresponding path in T is
from 6 T to 4 T , with g T at 4 T = 2i T at 6 T .
We can apply the assumed simulation invariant
h(g P ;i P )i6 P h(g T ;i T )i6 T toderive2i P at6 P =2i T at6 T . Since6 P
doesnotchangei
P and
6T doesnotchangeiT,wecanderivegP at4P =
gT at4T andiP at4P =iT at4T.
2.3 FormalFoundations
Inthissectionweoutlinetheformalfoundationsrequiredto
supportcredible compilation. The full paperpresentsthe
formalfoundationsindetail[11 ].
Crediblecompilationdependsonmachine-checkableproofs.
Itsusethereforerequirestheformalizationofseveral
compo-nents.First,wemustformalizetheprogramrepresentation,
namelycontrol- owgraphs,anddeneaformaloperational
semanticsforthatrepresentation. Tosupportproofsof
stan-dardinvariants,we mustformalize the Floyd-Hoare proof
rulesrequiredtoproveanyprogramproperty. Wemustalso
formalizetheproofrules forsimulationinvariants. Bothof
theseformalizationsdependonsomefacilityfordoingsimple
automatedproofs ofstandardfacts involving integer
prop-erties. It is alsonecessaryto showthat bothsets of proof
rulesaresound.
We have successfully performed these tasks for a
sim-plelanguage basedon control- ow graphs; see the full
re-portfor the completeelaboration [11 ]. Given this formal
foundation,thecompilercanusetheproofrulestoproduce
machine-checkable proofs of the correctness ofits analyses
andtransformations.
3 OptimizationSchemas
Wenextpresent examplesthatillustratehowtoprovethe
compilerwouldthenusetheschematoproduceacorrectness
proofthatgoesalongwitheachoptimization.
3.1 DeadAssignmentElimination
Thecompilercaneliminateanassignmenttoalocalvariable
ifthatvariableisnotusedaftertheassignment. Theproof
schema is relatively simple: the compiler simplygenerates
simluationinvariantsthatasserttheequalityof
correspond-ing live variables at corresponding points inthe program.
Figures5and6presentanexamplethatweusetoillustrate
theschema. Thisexamplecontinuestheexampleintroduced
inSection2. Figure7presentstheinvariantsthatthe
com-pilergeneratesforthisexample.
Figure 5: Program P
Before Dead Assignment
Elimination
Figure6:ProgramT After
DeadAssignment
Elimina-tion
I = fh(gP;iP)i4P h(gT;iT)i4T;hiPi5P hiTi5T;
hi P i6 P hi T i6 T ;hg P i7 P hg T i7 T g
Figure7: InvariantsforDeadAssignmentElimination
Note that the set I of invariants contains no standard
invariants. Ingeneral,deadassignmenteliminationrequires
only simulation invariants. The proofs of these invariants
are simple; the only complicationis the need to skip over
deadassignments.
3.2 BranchMovement
Ournextoptimizationmovesaconditionalbranchfromthe
top of a loop to the bottom. The optimization is legal if
theloopalwaysexecutesatleastonce. Thisoptimizationis
dierentfromalltheotheroptimizationswehavediscussed
sofar inthatitchangesthecontrol ow. Figure8presents
theprogrambeforebranchmovement;Figure9presentsthe
programafterbranchmovement. Figure10presentstheset
ofinvariantsthatthecompilergeneratesforthisexample.
One of the paths that the proof must consider is the
pathin theoriginal programP from 1
P to 4 P to7 P . No
execution of P, of course, will take this path | the loop
alwaysexecutes atleastonce,andthispathcorrespondsto
theloopexecutingzero times. Thefact thatthis pathwill
neverexecute shows up as a false conditionin the partial
simulationinvariantfor P thatispropagatedfrom7P back
to 1P. The correspondingpathinT thatis used toprove
I `hgPi7P hgTi7T is thepathfrom 1T through5T,6T,
foreBranchMovement BranchMovement
I=fh iPi5PhiTi5T;hiPi6PhiTi6T;hgPi7P hgTi7Tg
Figure10: InvariantsforBranchMovement
3.3 InductionVariableElimination
Our nextoptimization eliminates the induction variable i
fromthe loop, replacing itwithg. Thecorrectness ofthis
transformation depends on the invariant hgP =2iPi4P.
Figure 11 presents the program before induction variable
elimination;Figure12presentstheprogramafterinduction
variable elimination. Figure 13 presents the set of
invari-ants that the compiler generates for this example. These
invariants characterize the relationshipbetween the
elimi-natedinductionvariablei
P
fromthe originalprogramand
thevariableg
T
inthetransformedprogram.
Figure 11: Program P
Before Induction Variable
Elimination
Figure 12: Program T
After Induction Variable
Elimination
I = fhgP =2iPi4P;h2iPi5P hgTi5T;
h2i P i4 P hg T i4 T ;hg P i7 P hg T i7 T g
Figure13: Invariantsfor InductionVariableElimination
3.4 LoopUnrolling
The next optimization unrolls the loop once. Figure 14
presentstheprogrambeforeloopunrolling;Figure15presents
theprogramafterunrollingtheloop. Notethattheloop
un-rollingtransformationpreservestheloopexittest;thistest
canbeeliminatedbythedeadcodeeliminationoptimization
discussedinSection3.5.
Figure14: ProgramP
Be-foreLoopUnrolling
Figure 15: Program T
Af-terLoopUnrolling
I = fhgP%12=0_gP%12=6 i4P;hgP%12=0;gPi5P hg T i2 T ;hg P %12=6;g P i4 P hg T i3 T ; hg P %12=6;g P i5 P hg T i5 T ;
hgP%12=0;gPi4PhgTi4T;hgPi7P hgPi7Pg
Figure16: InvariantsforLoopUnrolling
Figure16presentsthesetofinvariantsthatthecompiler
generatesforthisexample. Notethat,unlikethesimulation
invariantsinpreviousexamples,thesesimulationinvariants
have conditions. The conditions are used to separate
dif-ferentexecutionsofthesamenodeintheoriginalprogram.
Someofthetime,theexecutionatnode4P correspondsto
the execution at node 4T, and other times to the
execu-tionatnode3T. Theconditionsinthesimulationinvariants
identifywhen,intheexecutionoftheoriginalprogram,each
correspondenceholds. Forexample, whengP%12=0, the
execution at 4P correspondsto theexecution at 4T;when
gP%12=6,the executionat4P correspondstothe
execu-tionat3T.
3.5 DeadCode Elimination
Our nextoptimization is dead code elimination. We
con-tinuewithourexamplebyeliminatingthebranchinthe
mid-dleoftheloopatnode3. Figure17presentstheprogram
be-forethebranchiseliminated. Thekeypropertythatallows
thecompilertoremovethebranchisthatg%12=6^g48
at3,whichimpliesthatg<48at3. Inotherwords,the
con-dition inthebranchisalwaystrue. Figure18presentsthe
programafterthebranchiseliminated. Figure19presents
thesetofinvariantsthatthecompilergeneratesforthis
ex-ample.
One of the paths that the proof must consider is the
potentialloopexitintheoriginalprogramP from3
P to7
P ;
In fact, the loopalways exits from 4
P , not3
P
. This fact
showsupbecausetheconjunctionofthestandardinvariant
hg P %12=6^g P 48i3 P
withtheconditiong
P
48from
the partial simulation invariant for P at 3
P
is false. The
correspondingpathinT thatisusedtoproveI`hi
P i7 P hi T i7 T
isthepathfrom5
T to4 T to7 T . 4 CodeGeneration
Inprinciple,webelievethatitispossibletoproduceaproof
Figure17: Program P
Be-fore Dead Code
Elimina-tion
Figure18: ProgramT
Af-terDeadCodeElimination
I = fh gP%12=0^gP <48 i2P; hg P %12=6^g P 48 i3 P ; hgP%12=6^gP <48 i5P; hg P %12=0^g P 48 i4 P ;hg P i2 P hg P i2 P ; hgPi5PhgPi5P;
hgPi3PhgPi5P;hgPi4P hgPi4P;
hg P i7 P hg P i7 P g
Figure19: InvariantsforDeadCodeElimination
proofsystem toworkwithastandardintermediateformat
basedoncontrol owgraphs. Theparser,whichproduces
theinitialcontrol owgraph,andthecodegenerator,which
generatesobjectcodefromthenalcontrol owgraph,are
therefore potentialsources of uncaught errors. Webelieve
it should be straightforward, for reasonable languages, to
produce a standard parser that is not a serious source of
errors. It isnot soobvious howthe codegeneratorcanbe
madesimpleenoughtobereliable.
Our goal is make the step from the nal control ow
graphtothegeneratedcodebeassmallaspossible. Ideally,
each node in the control ow graph would correspond to
asingleinstruction inthegenerated code. Toachievethis
goal,itmustbepossibletoexpresstheresultofcomplicated,
machine-speciccodegenerationalgorithms(suchas
regis-ter allocation and instruction selection) usingcontrol ow
graphs. Afterthecompilerapplies thesealgorithms,the
-nalcontrol owgraphwouldbestructuredinastylizedway
appropriateforthetargetarchitecture. Thecodegenerator
forthetargetarchitecturewouldacceptsuchacontrol ow
graph as input and use a simple translation algorithm to
producethenalobjectcode.
Withthis approach,weanticipatethat codegenerators
canbemadeapproximatelyassimpleasproofcheckers. We
thereforeanticipatethatitwillbepossibletobuildstandard
code generators with an acceptable level of reliability for
mostusers.However,wewouldonceagainliketoemphasize
thatitshouldbepossibletobuildaframeworkinwhichthe
compilationischeckedfromsourcecodetoobjectcode.
In the following two sections, we rst present an
ap-proach for asimple RISC instruction set, then discuss an
approachformorecomplicatedinstructionsets.
For a simple RISC instruction set, the key idea is to
in-troducespecialvariablesthatthecodegeneratorinterprets
asregisters. Thecontrol owgraphisthentransformedso
that each node corresponds to a single instruction in the
generatedcode.Werstconsiderassignmentnodes.
If the destination variable is a register variable, the
sourceexpressionmustbeoneofthefollowing:
{ Anon-registervariable. Inthiscasethenode
cor-respondstoaloadinstruction.
{ A constant. Inthiscasethenodecorrespondsto
aloadimmediateinstruction.
{ A single arithmeticoperation withregister
vari-able operands. Inthiscasethenodecorresponds
toanarithmeticinstructionthatoperatesonthe
two source registers to produce a value that is
writtenintothedestinationregister.
{ A single arithmetic operation with one register
variable operandand oneconstant operand. In
this case the nodecorrespondsto an arithmetic
instruction that operates on onesource register
and an immediate constant to produce a value
thatiswrittenintothedestinationregister.
Ifthedestinationvariableof anassignment nodeisa
non-registervariable,thesourceexpressionmust
con-sistofaregistervariable,andthenodecorrespondsto
astoreinstruction.
It is possible to convert assignment nodes with arbitrary
expressions to this form. The rst step is to atten the
expression by introducing temporary variables to hold the
intermediatevaluescomputedbytheexpression. Additional
assignmentnodestransferthesevaluestothenewtemporary
variables. The second step is to use a register allocation
algorithmtotransformthecontrol owgraphtottheform
describedabove.
Wenextconsider conditionalbranchnodes. Ifthe
con-ditionistheconstanttrueorfalse,thenodecorrespondsto
anunconditionalbranchinstruction. Otherwise, the
condi-tionmustcomparearegister variablewithzero sothatthe
instructioncorrespondseithertoabranchifzeroinstruction
orabranchifnotzeroinstruction.
4.2 MoreComplexInstructionSets
Many processors oer more complex instructions that, in
eect,domultiple thingsinasinglecycle. IntheARM
in-structionset,forexample,theexecutionofeachinstruction
maybepredicatedonseveralconditioncodes. ARM
instruc-tionscanthereforebemodeledasconsistingofaconditional
branchplustheotheroperationsintheinstruction. Thex86
instructionsethasinstructionsthatassignvaluestoseveral
registers.
Webelievethecorrectapproachforthesemorecomplex
instructionsetsistoletthecompilerwriterextendthe
possi-bletypesofnodesinthecontrol owgraph. Thesemantics
of each new type of node would be given interms of the
base nodes in standard control ow graphs. We illustrate
thisapproachwithanexample.
For instructionsets withconditioncodes,the
program-merwoulddeneanewvariableforeachconditioncodeand
appro-ment,testedtheappropriateconditions,andsetthe
appro-priate condition code variables. If theinstruction set also
haspredicatedexecution,thecontrol owgraphwoulduse
conditionalbranchnodestochecktheappropriatecondition
codesbeforeperformingtheinstruction.
Eachnewtypeofnodewouldcomewithproofrules
au-tomaticallyderivedfromitsunderlying control owgraph.
Theproof checkercould therefore verifyproofs oncontrol
ow graphs that include these types of nodes. The code
generator would require the precedingphases of the
com-piler to produce a control ow graph that contained only
thosetypesofnodesthattranslatedirectlyintoasingle
in-structiononthetargetarchitecture.Withthisapproach,all
complexcodegeneration algorithmscould operateon
con-trol owgraphs,withtheirresultscheckedforcorrectness.
5 RelatedWork
Mostexisting researchoncompilercorrectness hasfocused
ontechniquesthatdeliveracompilerguaranteedtooperate
correctlyoneveryinputprogram[5 ];wecallsuchacompiler
atotallycorrectcompiler. Acrediblecompiler,ontheother
hand,isnotnecessarilyguaranteed tooperatecorrectlyon
allprograms|itmerelyproducesaproofthatithas
oper-atedcorrectlyonthecurrentprogram.
Intheabsenceofotherdierences,onewouldclearly
pre-feratotally correct compilerto acrediblecompiler. After
all,thecrediblecompilermayfailtocompilesomeprograms
correctly,whilethetotallycorrectcompilerwillalwayswork.
Butthetotally correctcompilerapproachimposesa
signif-icant pragmatic drawback: it requires the source code of
thecompiler, ratherthanitsoutput,to beprovedcorrect.
Soprogrammers mustexpress the compiler in a way that
is amenable to these correctness proofs. In practice this
invasive constraint hasrestrictedthe compilerto alimited
setofsourcelanguages andcompileralgorithms. Although
the concept of atotally correct compiler has been around
formanyyears,thereare,toourknowledge,nototally
cor-rectcompilersthatproduceclosetoproduction-qualitycode
for realisticprogramminglanguages. Credible compilation
oersthecompilerdevelopermuchmorefreedom. The
com-pilercanbedevelopedinanylanguageusingany
methodol-ogyand performarbitrarytransformations. Theonly
con-straint isthat thecompiler produceaproof thatitsresult
iscorrect.
Theconceptofcrediblecompilershasalsoarisen inthe
contextofcompilingsynchronouslanguages[3,8 ]. Our
ap-proach,whilephilosophicallysimilar,istechnicallymuch
dif-ferent. Itisdesignedforstandardimperativelanguagesand
therefore uses drastically dierent techniques for deriving
andexpressingthecorrectnessproofs.
Weoftenareaskedthequestion\Howisyourapproach
dierentfromproof-carryingcode[6 ]?"
Inourview,
cred-iblecompilers andproof-carrying code areorthogonal
con-cepts.Proof-carryingcodeisusedtoprovepropertiesofone
program,typicallythecompiledprogram.Credible
compil-ers establish a correspondence between two programs: an
originalprogramandacompiledprogram. Givenasafe
pro-gramminglanguage,acrediblecompilerwill produce
guar-anteesthatarestrongerthanthoseprovidedbytypical
ap-plications of proof-carrying code. So, for example, if the
Proof-carryingcodeiscodeaugmentedwithaproofthatthecode
satisessafetypropertiessuchastypesafetyortheabsenceofarray
boundsviolations.
duces a proof that the compiled program correctly
imple-mentsthe original program,then thecompiledprogram is
alsotypesafe.
But proof-carrying code can, in principle, be used to
provepropertiesthatarenotvisibleinthesemanticsofthe
language. For example, onemight use proof-carrying code
toprovethataprogramdoesnotexecuteasequenceof
in-structions that may damage the hardware. Because most
languages simply do not deal with the kinds of concepts
requiredto provesuchaproperty asa correspondence
be-tweentwoprograms,crediblecompilationisnotparticularly
relevanttothesekindsofproblems.
Sincewerstwrotethispaper,crediblecompilationhas
developedsignicantly. Forexample,ithasbeenappliedto
programswithpointers[10 ]andtoCprograms[7 ].
6 ImpactofCredibleCompilation
We next discussthe potential impact of credible
compila-tion. Weconsiderveareas: debuggingcompilers,
increas-ingthe exibilityofcompilerdevelopment,just-in-time
com-pilers,conceptofanopencompiler,andtherelationshipof
crediblecompilationtobuildingcustomcompilers.
6.1 DebuggingCompilers
CompilersarenotoriouslydiÆculttobuildanddebug. Ina
large compiler, a surprising partof the diÆcultyis simply
recognizingincorrectlygeneratedcode.Thecurrentstateof
the art is to generatecode after aset of passes, thentest
that the generated code produces the same result as the
original code. Oncea pieceofincorrect code isfound, the
developer must spend time tracing the bug back through
layersofpassestotheoriginalsource.
Requiringthecompilertogenerateaproofforeach
trans-formationwilldramaticallysimplifythisprocess.Assoonas
apass operates incorrectly,thedeveloperwill immediately
be directed to the incorrectcode. Bugs canbe found and
eliminatedassoonastheyoccur.
6.2 FlexibleCompilerDevelopment
ItisdiÆcult,ifnotimpossible,toeliminateallofthebugsin
alargesoftware systemsuchas acompiler. Overtime,the
systemtendstostabilizearoundarelativelyreliablesoftware
baseasitis incrementallydebugged. Thepriceofthis
sta-bility isthat peoplebecomeextremely reluctantto change
thesoftware,eithertoaddfeaturesorevento xrelatively
minorbugs,forfearofinadvertantlyintroducingnewbugs.
Atsomepointthesystembecomesobsoletebecausethe
de-velopers are unable to upgrade it quicklyenoughfor it to
stay relevant.
Crediblecompilation,combinedwiththestandard
orga-nization of the compiler as asequence of passes,promises
tomakeitpossibletocontinuallyintroducenew,unreliable
codeintoamaturecompilerwithoutcompromising
function-alityor reliability. Considerthe following scenario.
Work-ingunderdeadlinepressure,acompilerdeveloperhascome
up a prototype implementation of a complex
transforma-tion. Thistransformationisofgreatinterestbecauseit
dra-maticallyimprovestheperformanceofseveralSPEC
bench-marks. But because the developer cut corners to get the
implementationoutquickly,itis unreliable. Withcredible
compilation,thisunreliabilityisnotaproblematall|the
nessproof anddiscardingtheresultsifitdidn'twork. The
compiler operates as reliablyasit did beforethe
introduc-tionofthenewpass,butwhenthepassworks,itgenerates
muchbettercode.
Itiswellknownthattheeortrequiredtomakea
com-piler work on all conceivable inputs is muchgreater than
theeortrequiredto make thecompiler workonall likely
inputs. Crediblecompilationmakesit possibletobuildthe
entirecompiler asa sequenceof passes that workonly for
commonorimportantcases. Because developerswouldbe
undernopressuretomakepassesworkonallcases,eachpass
couldbehackedtogetherquicklywithlittle testingand no
complicatedcodetohandleexceptionalcases. Theresultis
thatthecompilerwouldbemucheasierandcheapertobuild
andmucheasiertotargetforgoodperformanceonspecic
programs.
Analextrapolationistobuildspeculative
transforma-tions. Theidea isthatthecompilersimplyomitsthe
anal-ysisrequiredtodetermineifthe transformationislegal. It
doesthetransformationanywayandgeneratesaproofthat
thetransformation iscorrect. Thisproofisvalid,ofcourse,
onlyifthetransformation iscorrect. Theproof checker
l-tersoutinvalidtransformationsandkeepstherest.
Thisapproachshiftsworkfromthedevelopertotheproof
checker. The proof checkerdoes the analysis required to
determineifthe transformation islegal, and thedeveloper
canfocus onthe transformation and the proofgeneration,
notonwritingtheanalysiscode.
6.3 Just-In-Time Compilers
Theincreasednetworkinterconnectivity resultingfromthe
deploymentoftheInternethasenabledandpromotedanew
way to distribute software. Instead of compiling tonative
machinecodethatwillrunonlyononemachine,thesource
programiscompiledtoaportablebytecode.Aninterpreter
executesthebytecode.
Theproblemisthattheinterpretedbytecoderunsmuch
slowerthannativecode. Theproposedsolutionistousea
just-in-timecompiler to generate native code either when
the byte codearrivesor dynamically as it runs. Dynamic
compilationalsohastheadvantagethatitcanuse
dynami-callycollectedprolinginformationtodrivethecompilation
process.
Note,however,thatthejust-in-timecompilerisanother
complex,potentiallyerroneoussoftwarecomponentthatcan
aectthe correct execution of the program. If a compiler
generatesnativecode,theonlysubsystemsthatcanchange
thesemanticsofthenativecodebinaryduringnormal
oper-ationare theloader,dynamiclinker, operatingsystemand
hardware,allofwhicharerelativelystaticsystems.An
orga-nizationthatisshippingsoftwarecangenerateabinaryand
testitextensivelyonthekindofsystemsthatitscustomers
will use. If the customer nds an error, the organization
caninvestigate the problem by runningthe program ona
roughlyequivalentsystem.
Butwithdynamic compilation, thecompiledcode
con-stantlychangesin away that may be verydiÆcult to
re-produce. If thedynamic compiler incorrectlycompiles the
program,itmaybeextremelydiÆculttoreproducethe
con-ditionsthatcausedittofail. Thisadditionalcomplexityin
thecompilationapproachmakesitmorediÆculttobuilda
reliablecompiler. Italso makesitdiÆcultto assignblame
for any failure. When an error shows up, it could be
ei-therthecompilerortheapplication. Theorganizationsthat
and neither oneis motivatedto workhardto nd and x
theproblem. Theendresultisthat thetotalsystem stays
broken.
Crediblecompilationcaneliminate thisproblem. If the
dynamic compileremits aproofthat it executedcorrectly,
the run-time system can check the proof before accepting
thegeneratedcode.Allincorrectcodewouldbelteredout
beforeitcausedaproblem. Thisapproachrestoresthe
reli-ability propertiesofdistributingnativecodebinarieswhile
supportingtheconvenienceand exibilityofdynamic
compi-lationandthedistributionofsoftwareinportablebyte-code
format.
6.4 AnOpenCompiler
We believe that credible compilers will change the social
context in which compilers are built. Before a developer
cansafelyintegrateapassintothecompiler,theremustbe
some evidence that pass will work. But thereis currently
no way to verify the correctness of the pass. Developers
aretherefore typicallyreducedtorelyingonthereputation
of the personthat produced the pass, rather than on the
trustworthiness of the code itself. Inpractice, this means
thattheentirecompileristypicallybuiltbyasmall,cohesive
group of people ina single organization. The compiler is
closed inthe sense that thesepeoplemustcoordinateany
contributiontothecompiler.
Crediblecompilationeliminatestheneedfordevelopers
totrusteachother. Anyonecantakeanypass,integrateinto
their compiler,anduseit. Ifapassoperates incorrectly,it
is immediatelyapparent, andthe compilercandiscard the
transformation. There is no need to trust anyone. The
compilerisnowopenandanyonecancontribute. Insteadof
relyingonasmallgroupofpeopleinoneorganization, the
eort, energy, and intelligenceof every compiler developer
intheworldcanbeproductivelyappliedtothedevelopment
ofonecompiler.
Thekeystomakingthisvisionarealityareastandard
in-termediaterepresentation,logics for expressingthe proofs,
and a verier that checks the proofs. The representation
mustbe expressiveand supporttherange ofprogram
rep-resentationsrequiredforbothhighlevelandlowlevel
anal-ysesandtransformations. Ideally,therepresentationwould
be extensible, withdevelopers able toaugmentthe system
withnewconstructsandnewaxiomsthatcharacterizethese
constructs. Theverierwould beastandard pieceof
soft-ware. We expect several independent veriers to emerge
that would be used by most programmers; paranoid
pro-grammers can build their own verier. It might even be
possibletodoaformalcorrectnessproofoftheverier.
Once this standard infrastructure is in place, we can
leveragetheInternettocreateacompilerdevelopment
com-munity. Onecould imagine,for example,acompiler
devel-opment web portal with code transformation passes, front
ends,andveriers. Anyonecandownloadatransformation;
anyone canuseanyof the transformationswithout fear of
obtaininganincorrectresult. Eachdevelopercanconstruct
hisor herowncustomcompiler bystringing togethera
se-quenceofoptimizationpassesfromthiswebsite. Onecould
even imagineanintellectual property marketemerging, as
developerslicense passesorchargeelectronic cash foreach
useofapass. Infact,futurecompilersmayconsistofasetof
transformationsdistributed acrossmultiplewebsites, with
theprogram(anditscorrectnessproofs) owingthroughthe
Compilersaretraditionallythoughtofandbuiltas
general-purposesystemsthatshouldbeabletocompileanyprogram
giventothem. Asaconsequence,theytendtocontain
anal-ysesandtransformationsthatare ofgeneralutility and
al-mostalwaysapplicable. Anyextracomponentswouldslow
thecompilerdownandincreasethecomplexity.
The problem with this situation is that general
tech-niques tend to do relatively pedestrian things to the
pro-gram. For specic classes of programs, more specialized
analysesandtransformationswouldmakeahugedierence[12 ,
9 ,1]. Butbecausetheyarenotgenerallyuseful,theydon't
makeitintowidelyusedcompilers.
Webelievethatcrediblecompilationcanmakeitpossible
todeveloplotsofdierentcustomcompilersthathavebeen
specialized for specic classes of applications. The idea is
to make a set of credible passes available, then allow the
compiler builderto combine theminarbitraryways. Very
specializedpassescouldbeincludedwithoutthreateningthe
stability ofthecompiler. Onecouldeasily imaginearange
ofcompilersquicklydevelopedforeachclassofapplications.
Itwouldevenbepossibleextrapolatethisideatoinclude
optimistictransformations.Insomecases,itisdiÆculttodo
theanalysis requiredto performa specictransformation.
Inthis case, the compiler could simplyomit the analysis,
dothetransformation, andgenerateaproofthatwouldbe
correct ifthe analysis would have succeeded. Ifthe
trans-formationisincorrect,itwillbelteredoutbythecompiler
driver. Otherwise,thetransformationgoesthrough.
This example of optimistic transformations illustrates
a somewhat paradoxical property of credible compilation.
Eventhoughcrediblecompilationwillmakeitmucheasier
to develop correct compilers, it also makesit practical to
releasemuchbuggiercompilers.Infact,asdescribedbelow,
itmaychangethereliabilityexpectationsforcompilers.
Programmerscurrentlyexpectthatthecompilerwillwork
correctlyforeveryprogramthattheygiveit. Andyoucan
seethat somethingvery close to this level of reliability is
requiredif the compiler fails silentlywhen it fails | it is
verydiÆcultforprogrammerstobuildasystemifthereisa
reasonableprobabilitythat agivenerrorcanbe causedby
thecompilerandnottheapplication.
Butcredible compilationcompletely changes the
situa-tion. Iftheprogrammercandeterminewhether ornotthe
thecompileroperatedcorrectlybeforetestingtheprogram,
thedevelopmentprocesscantolerateacompilerthat
occa-sionallyfails.
Inthisscenario,thetaskofthecompilerdeveloperchanges
completely.Heorsheisnolongerresponsiblefordelivering
aprogramthat worksalmost allof the time. It isenough
todeliverasystemwhosefailuresdonotsignicantly
ham-per the development of the system. There is little need
tomakeveryuncommon casesworkcorrectly,especiallyif
thereare knownwork-arounds. Theresultis thatcompiler
developers can be much more aggressive | the length of
thedevelomentcyclewillshrinkandnewtechniqueswillbe
incorporatedintoproductioncompilersmuchmorequickly.
7 Conclusions
Mostresearchoncompilercorrectnesshasfocusedon
obtain-ing acompiler that isguaranteed to generatecorrect code
for everyinputprogram. This paperpresents aless
ambi-tious,buthopefullymuchmorepracticalapproach:require
the compiler to generatea proof that the generated code
tion,aswecallthisapproach,givesthecompiler developer
maximum exibility, helps developers nd compiler bugs,
andeliminatestheneedtotrustthedevelopersofcompiler
passes.
References
[1] S.Amarasinghe,J.Anderson,M. Lam,andC.Tseng.
The SUIF compiler for scalable parallel machines. In
ProceedingsoftheEighthSIAMConferenceonParallel
ProcessingforScienticComputing,February1995.
[2] K.AptandE.Olderog. VericationofSequential and
ConcurrentPrograms. Springer-Verlag,1997.
[3] A. Cimatti, F. Giunchiglia, P. Pecchiari, B. Pietra,
J. Profeta, D. Romano, P. Traverso, and B. Yu. A
provablycorrectembeddedverierfor thecertication
of safety critical software. In Proceedings of the 9th
International Conference onComputer Aided
Verica-tion, pages202{213,Haifa,Israel,June1997.
[4] R. Floyd. Assigning meanings to programs. In
J. Schwartz, editor, Proceedings of the Symposium in
AppliedMathematics,number19,pages19{32,1967.
[5] J. Guttman, J. Ramsdell, and M. Wand. VLISP: a
veriedimplementationofscheme. LispandSymbolic
Computing,8(1{2):33{110, March1995.
[6] G. Necula. Proof-carrying code. In Proceedings of
the24thAnnualACMSymposiumonthePrinciples of
ProgrammingLanguages,pages106{119,Paris,France,
January1997.
[7] G. Necula. Translation validation for an optimizing
compiler. InProceedingsof theSIGPLAN'00
Confer-enceonProgramLanguageDesignandImplementation,
Vancouver,Canada,June2000.
[8] A. Pnueli, M. Siegal, and E. Singerman. Translation
validation.InProceedingsofthe4thInternational
Con-ference on Tools and Algorithms for the Construction
andAnalysisofSystems,Lisbon,Portugal,March1998.
[9] M. Rinardand P. Diniz. Commutativity analysis: A
new framework for parallelizing compilers. In
Pro-ceedings of the SIGPLAN '96 Conference onProgram
Language Design and Implementation, pages 54{67,
Philadelphia,PA,May1996. ACM,NewYork.
[10] M. RinardandD.Marinov. Crediblecompilationwith
pointers.InProceedingsoftheWorkshoponRun-Time
ResultVerication, Trento,Italy,July1999.
[11] MartinRinard.Crediblecompilation.TechnicalReport
MIT-LCS-TR-776, Laboratory for Computer Science,
MassachusettsInstituteofTechnology,March1999.
[12] R. Rugina and M. Rinard. Automatic parallelization
of divide and conquer algorithms. In Proceedings of
the7thACMSIGPLANSymposiumonPrinciplesand
Practice of Parallel Programming, Atlanta, GA, May