AlexanderAiken
EECS Department
University of California,Berkeley
Berkeley, CA94702-1776
aikens.berkeley.edu
1 Introdution
Programanalysisis onernedwithautomatiallyextrating informationfromprograms. Programanal-
ysis is a large topi, with a long history and many appliations, partiularly in optimizing ompilers
and softwareengineering tools. Asmight be expeted of anybroad area, there are a number of distint
approahesto program analysis.
This paper provides an overview of onstraint-based program analysis. While muh has been writ-
ten about onstraint-based program analysis in reent years, there is relatively little material to assist
outsiders who wish to learn something about the eld. Two survey papers over the omputational
omplexityof variousonstraint problemsthat ariseinprogram analysis[Aik94 , PP97℄. Thepurposeof
the present work is to motivatethe use of onstraints forprogram analysis from the perspetive of the
appliationsof thetheory.
Program analysis using onstraints is divisible into onstraint generation and onstraint resolution.
Constraintgenerationproduesonstraintsfromaprogramtextthatgiveadelarativespeiationofthe
desiredinformationabouttheprogram. Constraintresolution(i.e.,solvingtheonstraints)thenomputes
this desired information. In the author's view, the onstraint-based analysis paradigm is appealing for
three primaryreasons:
Constraints separate speiation from implementation. Constraint generation is thespeiation
of the analysis; onstraint resolution is the implementation. This division helps to organize and
simplifyunderstandingof programanalyses. Thesoundness ofan analysis anbe proven solelyon
thebasisoftheonstraintsystems used|thereisno needtoresortto reasoningaboutapartiular
algorithmforsolvingtheonstraints. Ontheotherhand,algorithmsforsolvinglassesofonstraint
problemsan be presentedand analyzed independent of any partiularprogram analysis. General
resultson solvingonstraint problemsprovide\o-the-shelf" tools forprogramanalysis designers.
Constraints yield natural speiations. Constraints are(usually) loal; that is,eah piee of pro-
gram syntax ontributes its own onstraints in isolation from the rest of the program. The on-
juntionof allloalonstraintsaptures globalpropertiesof theprogrambeing analyzed.
ThisworkwassupportedbyNSFNationalYoungInvestigatorawardCCR-9457812. Thisversioninludes orretions
suggestedbyManuelFahndrih.
analysis have arih theory thatan be exploited inimplementations. We shallonly touh on this
subjetinthispaper.
Werstbrieydisussthelonghistoryoftheuseofonstraintsinprogramanalysis,whihpredatesthe
urrentinterestintheareabymanyyears(Setion2). Theoverviewproperbeginswiththeintrodution
of set onstraints, a widely used onstraint formalism in program analysis and the one with whih the
author isbestaquainted(Setion 3).
The balane of the papershows that three lassialproblems|standard dataow equations, simple
typeinferene,andmonomorphilosureanalysis|anbeviewedasinstanesofsetonstraint problems
(Setion 4). Eah of these three very basi analyses have been developed by dierent ommunities of
peopleoverextended periodsof time, andto ourknowledgeno formalonnetion betweenthe problems
has beennoted previously inthe literature. Our mainaim in hoosingthese problems, however, is that
we assume mostreadersare familiarwith at leastone of them and therebyareaorded aneasy path to
appreiation of the onstraint-based analysis perspetive. We also present one simple variationof type
inferene suggestive of theexpressive powerprovidedbysetonstraints(see Setion4.3).
To give some insight into the algorithmi issues involved in a general onstraint-based analysis sys-
tem we give onstraint resolution algorithmsfor the onstraint systems arising from the three example
analyses. It isimportant to realizethatindierent appliationswe are interested indierent notionsof
onstraint solvability. Dependingon theappliation,we maybe interested in onlyknowinga partiular
solution(e.g., theleastsolution) orinalulatingall solutions.
Setonstraintsprovideoneofthemostgeneraldeidabletheoriesknownforonstraint-basedprogram
analysis,andtheessentialissuesofonstraint-basedanalysisanbeillustratedeasilyusingsetonstraints.
However, we do not wish to give the impression that set onstraints are the only useful onstraint
theory for program analysis. In addition, there are of ourse other approahes to program analysis not
based on onstraints. Other onstraint formalisms,altogether dierent approahes, as well asthe plae
of onstraint-based program analysis in the general theory of abstrat interpretation, are disussed in
Setion6.
2 History
Using onstraints in program analysis is not a new idea. The earliest example we are aware of is due
to Reynolds,whoproposedan analysisofLispprogramsbasedon theresolution ofinlusiononstraints
in 1969 [Rey69 ℄. Similar ideas (but based on grammars rather than onstraints) were developed inde-
pendently later by Jones and Muhnik[JM79℄. Dataow equationsand type equations, two examples
thatwe shallinvestigate ingreaterdepthinSetion4,alsohave alonghistory. Dataowequationsform
thebasis ofmost lassialalgorithmsforowanalysis usedin ompilersforproedurallanguages (most
notably Cand FORTRAN). Typeequations arethebasisof type infereneforfuntionallanguages and
fortemplate-style polymorphisminobjet-oriented languages.
While the idea of program analysis using onstraints is not new,there has been a dramati shift in
theresearhperspetiveinreent years. Formerly,eah oftheproblemareas desribedabovewasviewed
asaseparatelineofresearh,withitsowntehniques,problems,andterminology. Eortstohybridizeor
extendthesetehniquesmetwithonsiderablediÆulty,atleastinpartbeauseitwasunknownwhether
theresultingonstraintproblemsouldbesolved. Todayitisunderstoodthattheseproblemsarerelated,
and that muh an be gained by viewing the problemsas instanes of a more general setting. In fat,
tehniques from eah of the lassial algorithms may be ombined quite freely to reate new program
To make theadvantages of the onstraint perspetive onrete, we useanother lassialproblem for
illustration. Mostompilersperformregister alloation to assignmahineregisters to programvariables.
Consider thefollowingfragment of imperativeode, where programvariables arenamed a,b,,and so
forth:
a := + d
e := a + b
f := e - 1
print(f)
A valid register assignment is a mapping from variable names to register names that preserves pro-
gram semantis. If theregister namesare r1, r2, r3, ..., then theprogram underone validregister
assignmentmay be:
r1 := r2 + r3
r4 := r1 + r5
r1 := r4 - 1
print(r1)
The diÆulty in register alloation is that there are usually more program variables than there are
registers tohold them. Intheexampleabove, sixvariablesaremapped into ve registers,withvariables
aandfsharingregisterr1. Ingeneral, avalidregisteralloationmaynoteven existforagivenprogram.
Inthisase, thenumberofvariablesintheprogramanbereduedbyspillingsomevariablesbyinserting
ode to save and restorethese variablesto and from mainmemory.
The register alloation problemwas already reognized in the FORTRAN I ompiler in the 1950's,
but the solution tehniques were ad ho and not entirely eetive. By the 1970's it was realized that
theweakness ofontemporary registeralloationwas a limitingfator inthedevelopment of optimizing
ompilers. A breakthroughame inthelate 1970's when Chaitinproposedaregisteralloationheuristi
based on graph oloring[CAC +
81 ℄. The signiane of the ontributionan be judged bythe fatthat
this tehnique was the subjet of one of the rst software patents. Chaitin's insight was to formulate
registeralloationas aonstraint problem.
A variable x is said to be live at a program point p ifx is referred to at some program point later
in the exeution ordering than p with no intervening assignment to x. Otherwise x is said to be dead.
Consideran assignmentstatement y:=:::. Abasiobservationaboutregister alloationis
If variable xis live when variable y is assigned, then x and y annot be held in the same
register.
Intheexampleabove,wehaveimpliitlyassumedthataisdeadatthepointwherefisassigned,allowing
reuseof a'sregisterto holdthe value off.
This observation suggests thefollowing natural onstraint problem. Let Reg:Variables ! Registers
bea registerassignment. The onstraintson Regare
Reg(x)6=Reg (y),xis live whereyisassigned
This formulation neatly aptures the onstraints under whih a register assignment is valid. The next
problemisto omputeregisterassignments. Theonstraintsnaturally speifyagraphwithonenode for
eah node of the graph an be assigned a olordierent from the olorof all of its neighbors insuh a
waythat nomore than k olors areused. Findinga registerassignment withk registersis equivalent to
ndingak oloringoftheonstraintgraph.
By thetimeof Chaitin'swork,itwasalreadyknownthatgrapholoringis anNP-ompleteproblem,
and therefore that eÆient exat solutions were very unlikelyto be found. Chaitin proposed a simple
heuristifor oloringthe graphbased onanotherobservation:
Ifa node xhas fewerthan k inident edges, then thegraphisk-olorableifand onlyifthe
graph obtained by removing x andits edges is k-olorable.
That is, if x has fewer than k neighbors, then there is always a olor for x, no matter how the rest of
the graphis olored. In ases wherethe heuristi fails to olorthe entiregraph (i.e., a point is reahed
whereall nodeshave k ormore neighbors) itisneessaryto hooseavariable tospill. While subsequent
work extends the heuristisfor oloringand spilling, grapholoringremains thebest framework known
forregisteralloationafter nearly 20 years.
This ratheroldexample illustratesall of theadvantages ofusingonstraint formulationsinprogram
analysis. The onstraint formulation as inequalities separates the speiation of the problemfrom its
implementation, and most importantlygives a globalharaterization of the onditions to be satised.
The abstrat onstraint problem, now free of the details of the partiular program and programming
language, an then be addressed by appropriate tehniques, in thisase graph oloring. Note that the
onstraintresolutionalgorithmproeedsinamannerthathasnodiretrelationshiptoprogramstruture,
and that if one were to atually view the sequene of alloationdeisions made by the greedy oloring
heuristi it would jump around from point to point in the program with no apparent pattern. If we
were to attempt formulating diretly an algorithm that was dened, e.g., byindution on the program
syntax,itisunlikelywewouldarriveat somethingaseetiveasonverting theproblemto a onstraint
representation.
Thereadermayndregisteralloationheuristisapeuliarhoieforahistorialexampleofprogram
analysis. After all,graph oloringregister alloationis notusuallyeven regarded asa program analysis
problem,letaloneaonstraint-basedone. However,itislearthattheonstraintformulationwasentral
in developing the tehnique. Registeralloation is interesting for anotherreason. To our knowledge, it
is the only signiant appliation of negative onstraints (i.e., inequalities) to program analysis in the
literature.
3 Set Constraints
Thissetiongivesabriefoverviewofsetonstraintsandthestateofknowledgeonsetonstraintproblems.
In Setion 4 we illustrateonnetions between disparateprogram analysis problemsusing the language
of setonstraints.
Set onstraints desribe relationships between sets of terms. A set onstraint hasthe form X Y,
where X and Y are set expressions. Let C be a set of onstrutors and let V be a set of set-valued
variables. Eah 2 C has a xed arity a(); if a() = 0 then is a onstant. The set expressions are
denedbythefollowinggrammar:
E ::=j0jE
1 [E
2 jE
1
\E
2 j:E
1 j(E
1
;:::;E
a() )j
i
(E
1 )
In this grammar, is a variable (i.e., 2 V) and is a onstrutor (i.e., 2 C). In the standard
interpretation, setexpressionsdenote sets ofterms. A termis(t
1
;:::;t
a()
)where 2C and every t
i is
H. An assignment is a mapping V ! 2 H
that assigns sets of terms to variables. The meaning of set
expressionsis given byextendingassignmentsfrom variables to setexpressionsasfollows:
(0) = ;
(E
1 [E
2
) = (E
1
)[(E
2 )
(E
1
\E
2
) = (E
1
)\(E
2 )
(:E
1
) = H (E
1 )
((E
1
;:::;E
n
)) = f(t
1
;:::;t
n )jt
i 2(E
i )g
( i
(E)) = ft
i j9(t
1
;:::;t
n
)2(E);1ing
Asystemofsetonstraintsisaniteonjuntionofonstraints V
i X
i Y
i
whereeahoftheX
i andY
i
isasetexpression. Asolutionofasystemofsetonstraintsisanassignment suhthat V
i (X
i
)(Y
i )
istrue. A systemof set onstraintsissatisable ifithasat least one solution.
The term \set onstraints"wasoinedbyHeintze and Jaar[HJ90℄,who were therst to reognize
and formalize set onstraints in their fullgenerality. It is a remarkable fat about manyset onstraint
problems that not only is it deidablewhether or not a system of onstraints has a solution, but that
all (potentially innitelymany) solutions an be given a nite representation. In their original paper,
Heintze and Jaar showed that a restrited lass of set onstraints ould be solved and the solutions
nitelypresented.
1
Anaturalandinterestingsublassofsetonstraintsexludesprojetionsbutinludesallotheropera-
tions. An algorithmthatexhibitsallsolutionsof suhonstraintsrst appearsin[AW92℄. Subsequently,
manyalternativeproofsofthisresultandonnetionstootherdisiplinesweredisovered,inludingtree
automata[GTT92 ℄andgraphtheory[AKVW93℄. Apartiularlyelegantresultshowsthatsetonstraints
withoutprojetions areequivalent to themonadi lassof prediatelogi[BGW93℄.
Inluding unrestritedprojetions ina omplete theoryturnsout to be a diÆultproblem. A series
of papers by a variety of authors show inreasingly powerful systems of onstraints to be deidable
[GTT93 , BGW93,CP94a , AKW95 ℄. Charatonik and Paholskinally show that thefull set onstraint
languageis deidablein[CP94b℄.
Showing deidability is, of ourse, a neessary rst step in obtainingpratial algorithms. Beyond
deidability,we would like eÆient algorithms and algorithmsthatompute niterepresentationsof so-
lutions. In these areas the state of knowledge is inomplete. Currently, the algorithms that ompute
nite representations of the solutions of set onstraints annot handle unrestrited projetions. Fur-
thermore, the omplexity of solving general set onstraints is high. Satisability of set onstraints is
NEXPTIME-omplete; infat, itremainsNEXPTIME-ompleteeven ifprojetions areeliminated.
The omplexity results strongly suggest that analyses based on solving set onstraints in their full
generality are infeasible. However, there are many very useful polynomial time fragments of the full
theory, andit isthese tratablesub-theoriesthat areourfousinthispaper.
3.1 Expressive Power
Fromthedenitionabove,itiseasytoseethatthesetexpressionsonsistonlyofelementarysetoperations
plus onstrutors|simply put, it is a set theory of terms. The onstraint language is rih enough,
1
Itisalsoworthnotingthatforsomevariationsofsetonstraints,inpartiularwiththeadditionoffuntionspaes, no
ompleteresolutionalgorithmisknownforthegeneralase.
makesset onstraintsa usefultoolforprogram analysis. Forexample, programminglanguagedatatype
failities provide \sums of produts" data types, whih means simply unions of (usually distint) data
type onstrutors. Allsuh datatypesan be expressedasset onstraints.
Let X=Y stand forthepair ofonstraints XY and Y X. Consider theonstraint
=ons(;)[nil
Ifonsandnilareinterpretedintheusualway,thenthesolutionofthisonstraintassignsto thesetof
alllistswithelementsdrawnfrom. Thisexamplealsoshowsthataspeialoperationforreursionisnot
requiredintheset expressionlanguage|reursionis obtainednaturallythroughreursive onstraints.
We have notsaid whether we mean ourlists above to be strit (as inmost languages) or non-strit
(as in lazy funtional languages). Set onstraints an be used for either, although dierent modelsare
required forstrit and non-strit onstrutors. In this paperwe wish to avoid mostof theomplexities
of disussingmodels,sowe simplyobserve thatfora non-strit onsthe followingidentityholds:
ons(X;Y)ons(X 0
;Y 0
),X X 0
^Y Y 0
For a strit ons one must naturally aount for stritness, namely that ons(0;Y) = 0 for all Y (and
similarlyfora 0intheseond position). Thustheidentityforastrit onsismore omplex:
ons(X;Y)ons(X 0
;Y 0
),(XX 0
^Y Y 0
)_X=0_Y =0
Itisbyapplyingequivalenessuhasthesethatsetonstraintsolverssolvesetonstraints(seeSetion5).
By hoosing the appropriate resolution rules either strit or non-strit onstrutors an be modeled
faithfully;infat, itispossibleto distinguishindividualargumentsofonstrutorsasstritornon-strit,
though we know of few appliations for suh generality. Beause of the disjuntion on the right-hand
side of the ,, it is in general more expensive to resolve onstraints involving strit onstrutors than
onstraintsusingonlynon-strit onstrutors.
The set of non-nil lists (with elements drawn from ) an be dened as = \:nil, where is
denedasabove. The set isusefulbeause itdesribestheproperdomainof thefuntionthatselets
therst element of alist;suh afuntionisundened foremptylists. Thisexamplealso illustratesthat
set onstraintsan desribepropersubsetsof standardsumsof produts datatypes.
A red-blak treeis abinary searhtree withthe followingproperties:
1. Every node iseitherred orblak.
2. Every leafis blak.
3. Every rednode hastwo blak hildren.
4. Every pathfrom theroot to a leafhasthe same numberof blak nodes.
Together these properties imply that a red-blak tree of n nodes has height at most 2log(n+1), so
red-blaktreesarewell-balanedtrees. Setonstraintsan desribeproperties(1)-(3) ofred-blaktrees.
In the following equations, the set desribessubtrees rooted at blak nodes and desribes subtrees
rootedat red nodes. Redand blak arebothbinary onstrutors:
= blak([;[)[blakleaf
= red(;)
thesolutionsof set onstraintsarealways desribablebyregular equations(seeSetion 5).
Thenal,admittedlyontrived,exampleshowsanon-trivialsystemofonstraintswheresomeworkis
requiredtoderivethesolutions. Considertheuniverseofthenaturalnumberswithoneunaryonstrutor
suand one nullaryonstrutor zero. Letthesystem ofonstraints be:
su():
^
su(:)
These onstraints say that if x 2 (resp. x 2 :) then su(x) 2 : (resp. su(x) 2 ). In other
words, these onstraints have two solutions, one where is the set of even natural numbers and one
where is theset ofoddnatural numbers. The solutionsare desribed bythefollowingequations:
=zero[su(su())
=su(zero)[su(su())
The two solutionsareinomparable; ingeneral, there isno leastsolutionof asystem ofset onstraints.
3.2 Extensions
Thereareextensions ofset onstraintsthathaveproven usefulinvariousappliations. The mostimpor-
tant extensions aresurveyed here.
3.2.1 Funtion Spae
Funtion spaes X !Y an be addedto the set expressions. In an appropriate model,the meaning of
X!Y is
X !Y =ffjx2X )f(x)2Yg
Note that semantially ! is not a labelled ross produt of the domain and the range; thus the term
semantis of set expressionsgiven above are notadequate to model funtionspaes. A suitabledomain
an be onstruted using standard tehniques of denotational semantis and, given suh a domain, set
onstraintresolutiontehniquesstillapply,althoughsofarasisknownadditionalrestritionsareneeded
on unionand intersetion to guarantee thattheonstraintsan besolved[AW93℄.
Thefuntionspaeonstrutoristherstexamplewehaveseenofaonstrutorthatisnotmonotoni.
2
Funtion spae is anti-monotoni in its rst argument and monotoni it its seond argument. That is,
thefollowinghold:
X !Y X !Y [Y 0
monotoni
X !Y X[X 0
!Y anti-monotoni
Peopleunfamiliarwiththetypetheoryoffuntionsoftenndthepropertyofanti-monotoniitysurprising.
Theexplanationis inthedenitionoffuntionspaeabove. Notetheimpliationintheset qualiation
\x 2 X ) f(x) 2 Y". Inreasing X strengthens the hypothesis, so fewer funtions f satisfy the
impliation and the resulting set is smaller. Inreasing Y weakens the onlusion, so more funtions f
satisfytheimpliation and theresultingset islarger. Funtionspaes areusedprimarilyintheanalysis
of funtionalprogramminglanguages [AW93,AWL94,AF95,FA97 , MW97,FFK +
96 ,FF97℄.
3
2
Afuntionf ismonotoniifwheneverxythenf(x)f(y).
3
It is alsopossible to dene analyses involving funtions that avoid anti-monotonionstrutors altogether, although
thesetehniquesassumetheentireprogramisavailabletobeanalyzedatone[Hei94 ,FF97 ℄.
ConditionalexpressionsY )X are equalto X ifY isnon-emptyand equalto 0 otherwise:
Y )X = (
0 ifY =0
X ifY 6=0
Conditional expressions are very useful for expressing onstraints on ow of ontrol in programs. For
example, onsiderthe followingasestatement on a boolean expression.
ase x of
true: y;
false: z;
esa
Wemaywishtoonstrutananalysisthatapturesthefatthattheresultofthisexpressionanbeyonly
ifxevaluates to trueand that theresult an bezonlyif xevaluates to false. Let[[℄℄:Expressions!
SetVariables be a funtionmapping a program phraseto a set variable orrespondingto theanalysis of
thatphraseinthe solutionsof theonstraints(thisnotation istaken from[PS91℄). Assumingthat true
and falsearesetonstrutoronstantswiththeobviousinterpretations, thenthedesiredonstraint for
theaseexpressionis
(([[x℄℄\true))[[y℄℄)[(([[x℄℄\false))[[z℄℄)[[ase x of true: y; false: z; esa℄℄
Itisworthwhilenotingthatfromthepointofviewofdeidability,onditionalexpressionsaddnothing
to set onstraintsasthey area speialase ofprojetions. To seethis, observethat
Y )X 1
((X;Y))
Here we relyon thefat thattheinterpretation ofonstrutors requires thatifY =0, then(X;Y)=0
foranyX. Ifone wishesto omputesolutions(andnotjustknowthat solutionsexist),thenitturnsout
thatfora languagewithoutexpliitprojetions butwithonditionalexpressions itispossibleto nitely
represent allsolutions ofthe onstraints[AWL94 ℄.
We shall sometimes nd it onvenient to allow onditional onstraints in addition to onditional
expressions. Aonditional onstraint hastheform
X )(Y Z)
and has the meaning that if X 6= 0 then Y Z must hold and otherwise there is no onstraint.
Conditionalexpressionsand onditional onstraintsare equivalentinthe sensethat
X )(Y Z)(X)Y)Z
4 Appliations
Thissetionpresentsappliationsofsetonstraintstothreelassialprogramanalysisproblems: dataow
analysis, type inferene, and losure analysis. We expet that at leastone of the hosen appliationsis
familiarto anyreaderwitha bakgroundinoneof themajorprogram analysisommunities. Weuseset
onstraintsas theommonlanguageinwhihtheanalysis problemsarepresented.
Classial dataow omputations forimperative languages inlude live variable analysis, reahing deni-
tions,and onstantpropagation,among others[ASU86 ℄. Thesealgorithmsareformalizedasthesolution
ofsystemsofonstraintsoverexpressionsbuiltfromsetsofonstants,setvariables,andthesetoperations:
E ::=a
1
j:::ja
n jjE
1
\E
2 jE
1 [E
2 j:E
1
In this grammar a
1
;:::;a
n
are the onstants (nullary onstrutors) and stands for a family of set
variables. The meaning of an expressionis a set of onstants. A system of onstraints is a onjuntion
of equalities V
i
i
= E
i
. We assume that eah variable appears on the left-hand side of at most one
equation.
For example, in a live variable analysis in a language suh as FORTRAN there is one onstant for
eah program variable. Theproblemis to ompute, foreah program statement S, thevariablesx that
maybe usedaftertheexeution ofS withoutanyintervening assignmentsto x. Forbrevitywe onsider
onlythe ase where S isan assignment statement; the formulation forother program onstruts is also
straightforward. Foreah assignment statement we needto knowtwo onstant sets:
S
def
isthe setof variablesdened(written)byS.
S
use
is thesetof variablesused(read)byS.
For example, in the statement x = x+y we have S
def
= x and S
use
= x[y. For eah statement S
therearetwosetvariables[[S℄℄
in
and[[S℄℄
out
,orrespondingtothesetofvariablesliveimmediatelybefore
and after S respetively. Letsu(S) bethestatements immediatelyafter S in programexeution. The
systemof onstraintsis then
[[S℄℄
in
= S
use [([[S℄℄
out
\:S
def )
[[S℄℄
out
=
[
X2su(S) [[X℄℄
in
Theseonstraints expresshow livevariablesare(or arenot)propagated fromone program statement to
another. Forexample,for thestatement x=x+ytherst onstraint is
[[S℄℄
in
=fx;yg[([[S℄℄
out
\:fxg)
whih isequivalent to
[[S℄℄
in
=fx;yg[[[S℄℄
out
Thereareafewsubtletiesinourformulationoflivevariableanalysisworthdisussing. First,notethe
optimizationoftheonstraintrepresentationintheimmediatelypreedinglines(i.e.,whereanintersetion
is eliminated from theright-hand sideof the equation). In the proess of solving the equations it may
be neessary to evaluate individual equations many times under dierent assignments to the variables.
Thus,applyingidentitiesto simplifyonstraintsan signiantlyimprove theperformaneof onstraint
resolution implementations. This example merelyhints at what transformations arepossible, and there
isa substantialliteratureon simplifyingsetonstraints [Pot96,TS96,FA96 , FF97 ,MW97℄.
dataow theory. Note that the set expression grammar above allows negation of arbitrary expressions
:E. Thestandardproofthatdataowequationshave solutionsrequiresthatalloperatorsbemonotoni,
whih : learly is not. To ahieve monotoniity, set omplement is restrited to statially known sets
(i.e., set expressionswithoutvariables) inwhih ase theright-hand sidesof equationsare monotone in
allvariables. Thisrestritionisnotstritlyrequired|theonstraintspresented(with:)an besolvedas
theyareaspeialaseofmore generalsetonstraintsforwhihresolutionalgorithmsareknown[AW92 ℄.
There are reasons, however, to prefer restrited set omplement in dataow analysis. First, adding
general omplement raises theomputational omplexity signiantly (see disussionat the endof this
setion). Seond,indataowanalysis we usuallyareinterested ina bestsolution, eithertheleast orthe
greatest. A unique best solution need not exist ifset omplement is unrestrited. For the purposes of
dataow analysis,we shall assumesimply thatnegation isused ina suh away thatset expressionsare
monotoneinall variables.
Forlive variable analysis it is the leastsolution that is desired. In thisase, the followinginlusion
onstraintsare equivalent:
[[S℄℄
in
S
use [([[S℄℄
out
\:S
def )
[[S℄℄
out
[
X2su(S) [[X℄℄
in
As a useful exerise in manipulating onstraints we now show that these inlusions have the same
least solutionasthe equalities. (Solution isleast iffor any other solution 0
, we have (x) 0
(x) for
allx.) Beauseequalityimpliesinlusion,itfollowsthatevery solutionof theequalitiesisalso asolution
of the inlusions. Therefore, it suÆes to show that the inlusions have a least solution that is also a
solutionofthe equations.
As a rst step, note that the onstraints always have a solution
i
= fa
1
;:::;a
n
g (the set of all
onstants). Every inlusion onstraint issatisedbeausethe left-handsideisthe largestpossibleset.
Let
1 and
2
beanysolutionsoftheinlusionsandlet
3
()=
1 ()\
2
(). Nowforeveryinlusion
onstraint E we have
1
()
1
(E)
3 (E)
2
()
2
(E)
3 (E)
wherethelast stepof bothlines followsbymonotoniity. It follows that
1
()\
2
()=
3
()
3 (E)
so
3
is also asolution of theinlusions. Sine there alwaysexists a solution, solutionsare losedunder
intersetion,andthereareonlynitelymanysolutions(beausethedomainisniteandthereareanite
numberofvariables),there mustbe aleast solution.
Letbetheleastsolutionoftheinlusionsand assumeforthesakeofaontradition thatitisnota
solutionoftheequalities. ThenthereisaonstraintE suhthat()(E). Let 0
=[ (E)℄.
Nowwehave
()
0
()=(E) 0
(E)
bymonotoniity. Forany other onstraint 0
E 0
we know 6=
0
(reallevery variable appears in at
mostone left-hand side),and we have
( 0
)= 0
( 0
) 0
(E 0
)
where the last again follows by monotoniity. Thus, is a solution smaller than , a ontradition.
We onludethat isa solutionoftheequalities.
Dataowequationsareaspeialaseofsetonstraintswheretheonlyonstrutorsareonstants, the
left-hand sideof an equation is always a variable,and setomplement is restrited. The deidabilityof
theseequalityonstraintsfollowsimmediatelyfromthedeidabilityofsetonstraints. Moreinterestingly,
though,thedeidabilityofextensionsalsofollowsimmediately. Asnotedabove,unrestritedomplement
anbeaddedandallsolutionsarestillomputable,althoughtheomputationalomplexityinreasesfrom
polynomialtimeto NP-omplete [AKVW93℄.
Two other set onstraint extensions to dataow analysis are partiularly useful. The rst is the
addition of onditional expressions X ) Y. As noted earlier, onditional expressions an be used to
modelontrolow, whihomplementstheemphasisondataowin(aptlynamed)dataowanalysis. A
goodexampleoftheombinationofthesefeaturesisfoundin[Hei94 ,AFS98 ℄. Theseondextensionisthe
abilityto perform dataowanalysis of data strutures byinluding non-atomionstrutors. Set-based
analysis isa anonialexample ofa systemthat exploitsthisfeatureof setonstraints [Hei92 ,Hei94 ℄.
Finally, the algorithm given by the onstraint resolution rules is unlikely to be as eÆient as the
standardalgorithmsforlivevariable analysis. Theulpritis theruleforaddingtransitive onstraints
E
1
^E
2 E
1
^E
2
^E
1 E
2
whih addsnew onstraintsbetween variables ) ,somethingthat pratial implemen-
tationsforthisproblemdonotdo. ToahieveanalgorithmwitheÆienyakintothoseusedinpratie,
we an modifythe rulefortransitive onstraints to propagate only onstants inlower boundsto upper
bounds:
a^ Ea^E^aE
It is easy to show that this rule makes the least solution expliit; eah variable is assigned the set of
onstants appearingin itslowerbound.
4.2 Simple Type Inferene
Type inferene is a entral omponent of statially typed funtional languages. The essene of the
inferene algorithmisto generatea systemof typeonstraintsfromtheprogram text. Iftheonstraints
aresolvablethentheprogram istypableand thetypesofprogramphrasesareexhibitedbythesolutions
of theonstraints.
Forourpurposes thepurelambda alulussuÆes astheprogramminglanguage:
e::=xjx:e
1 je
1 e
2
For simpliity,we assume that variables in an expression are renamed as neessary so that all lambda
boundvariablesaredistint. Forasimple(thatis,notpolymorphi)typesystem, theexpressionsof the
onstraint languageare
E::=jE
1
!E
2
where! is an inxbinary type onstrutor. Constraint systemsare onjuntionsof equations V
i E
i1
=
E
i2
. As disussed in Setion 3.2.1, the term model presented in Setion 3 is inadequate for funtion
spaes, butadequatemodelsdo exist.
There are many equivalent ways to speify simple type inferene. One whih is lose to atual
implementations of type inferene algorithms uses systems of type equations. As before, we use [[e℄℄ to
stand foratype variable assoiatedwith e.
[[e
1
℄℄ = [[e
2
℄℄![[e
1 e
2
℄℄
This formulation is equivalent to the standard one whih uses inferene rules and is well-known
[Wan87℄. Under theserules itiseasy to verifythetypesof thefollowingexamples:
x:x :
x
!
x
z:y:z :
z
!(
y
!
z )
(z:y:z)x:x :
y
!(
x
!
x )
f:x:f(f(x)) : (
x
!
x )!
x
!
x
Dependingon whether niteorinnite solutionsaredesired,theonstraints aresolved usingrespe-
tivelyuniationorirularuniation. Ifirularuniationisused,thenevery lambdaexpressionhas
a type. (To see this, notethat both equationsan be solved byassigningevery expressionthereursive
type =!.) Notevery expression hasa type usingordinaryuniation. Ofourse, an alternative
proof of deidabilityis to observe that these areset onstraints. Note, however, thatjust asin thease
of uniationanourshekisrequired ifonlynitesolutions aredesired.
4.3 A Variation
One again we an obtaingeneralizations of thefamiliar theory. Forexample, by generalizing terms to
sets we an denethefollowinggrammarfortypes:
E =jE
1
!E
2 jE
1
\E
2 jE
1 [E
2 j0
Wereasttheonstraintstouseinlusioninsteadofequalityandallowsolutionsto beexpressedinterms
of themore expressive types:
[[x:e℄℄ [[x℄℄![[e℄℄
[[e
1
℄℄ [[e
2
℄℄![[e
1 e
2
℄℄
The rst onstraint says simply that the type of x:e must inlude all the funtions of type [[x℄℄! [[e℄℄.
To understand theseond onstraint,note that fortheonstraints to have anysolutions [[e
1
℄℄ must be a
set offuntions. Assume [[e
1
℄℄=X!Y forsome X and Y. We thenhave
[[e
1
℄℄=X!Y [[e
2
℄℄![[e
1 e
2
℄℄
whih implies,usingtheanti-monotoniityof thedomainand monotoniityof therange,that
[[e
2
℄℄X^Y [[e
1 e
2
℄℄
Inotherwords,thedomainX ofe
1
mustaept thetypeoftheargument [[e
2
℄℄,and thetype oftheresult
[[e
1 e
2
℄℄must be at leasttherange Y ofe
1 .
Under these inlusion onstraints many funtions have substantially more preise types than under
theoriginalequalityonstraints. Forexample, thefuntionthatappliesa funtiontwieto itsargument
hasthetype:
f:x:f(f(x)):((!)\( !))!(!)
provided that f has signatures ! and ! that an be omposed to produe a funtion of type
!.
The extended type system presented here is somewhat related to intersetion type disiplines. The
languageofintersetiontypesretains variables,funtionspaes, andintersetionsbetweentypes, butno
0 ortype union. However, most intersetion type disiplineshave muhmore general rulesforassigning
types to expressions than the onstraint generation rules we give above. As a result, even typehek-
ing for the natural intersetion type disipline is undeidable [CC90℄. Restrited, deidable versions of
intersetion type systemshave reeived onsiderableattention (see, e.g.,[CG92 ℄).
4.4 Closure Analysis
Astandard programanalysisforfuntionallanguagesislosureanalysis. Beause losureanalysisis not
aswell-known asdataowanalysis and type inferene,we rst desribea simple losure analysisbefore
disussingonstraints.
Intuitively, the losure analysis problem for the lambda alulus is to estimate the set of lambda
abstrationstowhih aprogramvariable anbeboundduringredution. Forexample, intheexpression
(x:x)y:y,thevariablexwillbeboundto anexpressionbeginningy,whiley willnotbeboundto any
expression. Closureanalysisisusedtoderiveanapproximationoftheontrol owgraphinahigherorder
funtional language. In a rst order language (suh as FORTRAN) the ontrol ow graph is statially
known|the order in whih expressions areevaluated is obviousfrom program syntax, and thisorder is
the struture from whih dataow analysis algorithms are built. In a higher order language, the order
in whih expressions are evaluated must be inferred and, in general, approximated. Closure analysis is
a well-known algorithm for approximating the ontrol-ow graph of a program and has been studied
extensively[Shi88,Ses91 ,PS91 , Pal95 ,NN97 ℄.
OurdevelopmentoflosureanalysisfollowsPalsberg's. Let[[e℄℄beavariableassoiatedwithexpression
e; thisvariable rangesoversets of lambda bindingsappearing intheomplete expression. Forexample,
for the expressionx:y:x the set of lambdas is f
x
;
y
g. For a xed lambda expression e, the losure
analysis istheleast solutionof a systemof onstraintsderivedfrom thesub-expressions ofe:
Sub-Expression Constraints
x:e
0
x
[[x:e
0
℄℄
e
1 e
2
foreveryx:e
3 in e
x [[e
1
℄℄)([[e
2
℄℄[[x℄℄ ^ [[e
3
℄℄[[e
1 e
2
℄℄)
Forthe expression(x:x)y:y,theonstraintsare
f
x
g[[x:x℄℄
f
y
g[[y:y℄℄
x
[[x:x℄℄)([[y:y℄℄[[x℄℄ ^ [[x℄℄[[(x:x)y:y℄℄)
y
[[x:x℄℄)([[y:y℄℄[[y℄℄ ^ [[y℄℄[[(x:x)y:y℄℄)
Solutionsof the onstraintsare orderedpointwise; i.e., 0
if and onlyif(x) 0
(x) forall x. It is
easy to verify thattheleastsolutionof theonstraintsis
[[x℄℄ = f
y g
[[y℄℄ = ;
x
[[y:y℄℄ = f
y g
[[(x:x)y:y℄℄ = f
y g
Our denitionoflosure analysis introdues two smallextensionsto theonstraintnotation we have
dened. Dene X) P to mean X\) P,whih isequivalent butstays withinoursyntax. Also,
deneX )(Y ^Z)to mean (X )Y)^(X )Z).
ThefatthatsetonstraintsofthisformanbesolvedfortheleastsolutionintimeO(n 3
)followsim-
mediatelyfrommoregeneralresultsonsolvingsystemsofsetonstraints[Hei94 ,AWL94 ℄(seeSetion5).
Historially, however, losure analysis has been investigated over a period of many years in isolation
from other tehniquesand, essentially, thefragment of set onstraints needed for theproblemhas been
disovered from rst priniples[Shi88,PS91 ℄. Set-based analysis an be viewed as a more general form
of losure analysis where, among other things, there is some abilityto trak theow ofontrol through
onditional tests[Hei94 ℄.
5 Solving Constraints
So farwe have worked at thelevel of speifyingthe onstraintsforpartiular program analysis applia-
tions. In this setion we disussomputing solutions of onstraints. The general strategy in onstraint
resolutionalgorithmsisalwaysthesame: Aninitialsystemofonstraintsisrepeatedlytransformedusing
simplerules untilthesystem is ina \solved form." We illustrate thisapproah usingthe three analysis
problemspresentedin Setion4.
We begin bydeningour notionof a solved form systemof onstraints. We showthat anyindutive
systemofonstraintshassolutions,andthatinfatallsolutionsareexpliitintheformoftheonstraints
(Setion 5.1). In the following subsetions we give algorithms for transforming the onstraint systems
developed inSetion4 into indutiveform.
5.1 Indutive Systems
We shalllimitour disussionto the followingexpressionlanguage,whihexludesprojetions.
E::=j0jE
1 [E
2 jE
1
\E
2 j:E
1 j(E
1
;:::;E
a() )
Muh ofthedevelopmentinthissetion follows [AW93 ℄.
Wemakeuseoftwopreviousresultsintheproofthatindutivesystems havesolutions. The rstisa
tehniquefortransforminginlusiononstraintstoanequivalentsystemofequations[AW92℄. Theseond
is thefat that systems of ontrative equations have uniquesolutions [MPS84 ℄. The onstraint-solving
algorithm presentedinSetion5 reduesan initialsystemof onstraints to aset ofsystems of indutive
onstraintsor reportsthat theinitial systemisinonsistent.
Todisussonstraintsolvingitisneessarytobefairlyspeiaboutthesemanti domain. We have
disussed two domains, a domain of terms and a domain that inludes funtion spaes. For simpliity,
we shall prove our results only for the term domain. We need the following denition. Let D
j be an
inreasing sequeneof sets thatontain larger terms(terms of greaterheight)asj inreases:
D
0
=;
D
j
=f(t
1
;:::;t
a() )jt
j 2D
j 1 g[D
j 1
for showing that an arbitrary system of inlusion onstraints over variables
1
;:::;
n
has a solution.
Initially,let
i
=0 for1in. At step j of theindution,assignsome terms of D
j to
1
,thento
2 ,
and so on, up to
n
. At eah step (j;i) of thisdoubleindutionover the terms ofD
j
and variables
i ,
we must ensurethat theonstraintsaresatised forall elements inD
j
. Ifthisan be donefor allpairs
(j;i) thenthesystemhas asolution.
In suh an indutiveproof,we must distinguishbetween variables insideof onstrutors (), whih
ontributetermsfromD
j 1
,andvariablesoutsideofonstrutors\(:::),whihontributetermsfrom
D
j .
Denition 5.1 Thetop-levelvariablesofX(denotedTLV(X))arethevariablesinXthatappearoutside
of aonstrutor. Formally,
TLV(
i
) = f
i g
TLV(0) = ;
TLV((:::)) = ;
TLV(E
1 [E
2
) = TLV(E
1
)[TLV(E
2 )
TLV(E
1
\E
2
) = TLV(E
1
)[TLV(E
2 )
TLV(:E
1
) = TLV(E
1 )
Top-levelvariablesare also alledthenon-expansive variables[MPS84 ℄.
Denition 5.2 A systemS ofonstraints isindutive ifthefollowingthree onditionshold:
1. S = V
1in L
i
i U
i
(i.e.,there isone lowerboundL
i
and upperboundU
i
pervariable
i )
2. TLV(L
i
)[TLV(U
i )f
1
;:::;
i 1
gfor1in
3. Forall i
0
=1;:::;nand integersj, thefollowingholdsinallassignments:
(8i=1;:::;i
0
1(L
i
\D
j
i
\D
j U
i
\D
j ) and
8i=i
0
;:::;n(L
i
\D
j 1
i
\D
j 1 U
i
\D
j 1 ))
)L
i
0
\D
j U
i
0
\D
j
Parts 1 and 2 are simple syntati properties. Part 3 is a more omplex semanti ondition. The
doubleindutionoutlinedabove foronstruting solutions is expressedin part 3,whihsays that if the
onstraintsaresatisable upto someleveli
0
and variable
j 1
,thentheonstraintsaresatisedforthe
next lower andupperboundpairin theindutionL
i
0
\D
j U
i
0
\D
j .
Denition5.2makesitpossibleto buildsolutionsindutivelyat levelD
j
byassigningvaluesinorder
to
1
;:::;
n
sinepart2 ensuresthatvariablesareonstrainedonlybylower-numberedvariablesat the
top level and part 3 ensures that
i
0
an be given a value between L
i
0
and U
i
0
. Systems that do not
satisfypart 3maynothave anysolutions (onsider,for example,system1
1 0).
Indutive systems are the output of our onstraint resolution proedures. That is, we will give
proedures (starting in Setion 5.3) for transforming an initial onstraint system into an equivalent
systeminindutiveform. Fortheseresolutionalgorithmsweanprovethatiftheoutputofthealgorithm
ontainsnotriviallyinonsistentonstraints(e.g., 10orint0)thenthesystemisinindutiveform
and therefore hassolutions.
We showthatindutivesystems have solutionsintwo steps: rst,we showthat an indutivesystem
isequivalent to a systemof equations;we thenshow thattheequationsalways have solutions.
1 1 n n i
side)is asading ifTLV(E
i )\f
i
;:::;
n g=;.
Theorem 5.4 Let S = V
i L
i
i U
i
be an indutive system ofonstraints. Then S is equivalent to
theasading equations
i
=L
i [ (
i
\U
i
) wherethe
i
are freshvariables.
Proof: Assumethat L
i
i U
i
and let
i
=
i . Then
i
= L
i [(
i
\U
i
) sineL
i
i U
i
= L
i [(
i
\U
i
) sine
i
=
i
Thus,everysolutionoftheonstraintsinduesasolutionoftheequations. Fortheotherdiretion,assume
that
i
= L
i [(
i
\U
i
) for some
i
. Clearly, L
i
i
. To show
i U
i
, we rst show forall i and j
that
i
\D
j U
i
\D
j
. Forthe sake of obtaininga ontradition, assume
i
\D
j 6U
i
\D
j
for some
i and j. Pikthe smallestsuh pair (j;i) orderedlexiographially. Note L
k
\D
l
k
\D
l U
k
\D
l
holdsif(k;l) <(j;i) byassumption and beause L
k a
k
. Sine the systemis indutive, itfollows that
L
i
\D
j U
i
\D
j
. Therefore
i
\D
j
= (L
i [(
i
\U
i ))\D
j
= (L
i
\D
j )[(
i
\U
i
\D
j )
U
i
\D
j
whih ontraditstheassumption. Thusforalli,
i
\D
j U
i
\D
j
forall j
)
i
\D
j U
i
forall j
)
i U
i
sine S
j D
j
=H
2
Theorem5.5showsthateveryhoie forthe
i
induesauniquesolutionto theasading equations.
Theorem 5.5 Let
1
= E
1
^:::^
n
= E
n
be a system of asading equations and let be any
assignment for the variables other than the f
1
;:::;
n
g. There is a unique extension 0
of that is a
solutionofthe equations.
Proof: Variable
i
anbeeliminatedfrom thetop-levelvariablesofeveryequationbysubstitutingE
i
for
i inE
i+1
throughE
n
. Let beanyremainingtop-levelfreevariable. Thendoesnotappearonthe
left-hand side of anyequation; we all suh variables free. For any xed assignment for thetop-level
freevariables,theequationsbeomeontrative (havenotop-levelvariables). Contrativeequationshave
uniquesolutions [MPS84 ℄. 2
5.2 A Digression on Set Complement
Set omplement is quite handy for expressing analyses, but in solutions of onstraints we often wish
to eliminate omplements so that we an see whih terms may belong to an expression E rather than
asading equations:
:0 = 1 where1= [
2C
(1;:::;1)
:(E
1 [E
2
) = :E
1
\:E
2
:(E
1
\E
2
) = :E
1 [:E
2
::E = E
:(E
1
;:::;E
a()
) = (:E
1
;1;:::;1)[:::[(1;:::;1;:E
a() )[
[
d2C fg
d(1;:::;1)
The equationintherstlinedenes1to betheHerbranduniverse. Foreahequation
i
=E
i reate
a new equation :
i
= :E
i
and simplify the right-hand side.
4
Now replae :
i
everywhere by a fresh
variable
i
. Thepreeding rulesand thistehnique foreliminating:
i
remove allnegations exepton a
freevariable . Anegation : annot beremoved, asthe arefreevariablesinthe onstraints.
Thereisanotherimportantissuewithsetomplement. Wehaveassumedthatthesetofonstrutors
isnite,andtherefore:(::: )anbewrittenasaboveusinganexpliitunionofallnon-terms. However,
in many appliations it is unreasonable to assume that we know all of the onstrutors. Typially the
set of onstrutors is determined by the program text. Beause a onstrutor dened in one part of a
programpotentiallyappearsinthesolutionsoftheonstraintsofanypartofthatprogram,assumingthat
all onstrutorsare knownat theoutsetmakesitimpossibleto analyzeprogram omponents separately.
It is notdiÆult to remove theassumption that all onstrutors are known. Assume nowthat C is
an inniteset ofonstrutors. We add thefollowingnew setexpressionwiththesemantis:
(NOT(f
1
;:::;
n
g))=fd(t
1
;:::;t
a(d) )jt
i
2H^d2C f
1
;:::;
n gg
IntuitivelyNOT is the set of all terms with a headonstrutor not inthe argument list. It is straight-
forwardto inludeNOT inthealgebraof setexpressions. Forexample:
:NOT(f
1
;:::;
n
g) =
1
(1;:::;1)[:::[
n
(1;:::;1)
:(E
1
;:::;E
n
) = (:E
1
;1;:::;1)[:::[(1;:::;1;:E
n
)[NOT(fg)
NOT(f
1
;:::;
n
g)\NOT(fd
1
;:::;d
m
g) = NOT(f
1
;:::;
n g[fd
1
;:::;d
m g)
1 = NOT(;)
Even inthease whereall onstrutors are known,NOT(fg)is amore eÆient representationthanan
expliitunionof all onstrutors exept.
5.3 Closure Analysis
We now turn to algorithms for solving onstraints. Constraint resolution is done by applying a set of
rewrite rules repeatedly untillosure. Forpedagogial reasons we present the rules a few at a time, as
needed foreah appliation. However, it is emphasizedthat indevelopingnew appliationsit is usually
unneessary to invent new rules. New analyses generally are expressed usingtheestablished mahinery
(theompletesetofrules),whihmeanstheanalysisdesigneran simplywritetheneessaryonstraints
and beassured theonstraintsan besolved.
4
This steponly works beause the asading equations are already ontrative inthe i. For example, startingwith
=andaddingomplementsgivesusanequationwithexatlythesamesolutions:=:.
S^E
1 [E
2 E
3
S^E
1 E
3
^E
2 E
3
(2)
S^ S (3)
S^E
1
^E
2
S^E
1
^E
2
^E
1 E
2
(4)
S^
x
2)E
1 E
2
^
x
S^E
1 E
2
^
x
(5)
Figure1: Rulesforsimplifyingonstraints.
Webeginwithlosureanalysisasithasthesimplestresolutionproedure. Expressionshave theform
E ::=
x
jj0jE
1 [E
2 j
x
)E
1
and a systemSof onstraintshas theform
S =
^
i E
i
i
WesaytwosystemsareequivalentS
1 S
2
iftheyhavethesamesetofsolutions. Figure1givesanumber
of equivalenesforlosure analysis onstraints. Itis easy to verifythat theseare infatequivalenes.
A onstraint
i
U (respetivelyL
i
) isindutive ifTLV(U) (respetivelyTLV(L)is asubsetof
f
0
;:::;
i 1
g. The algorithmforsolvingthelosure analysis onstraintsisas follows.
Readthe equivalenes asrewrite rulesgoingfromlefttoright. Therulesareapplied tothe
onstraint system repeatedly, in any order, untilno newindutive onstraintsan be added.
Let S 0
be the result of losing the system S under the rewrite rules. The following statements are
easilyveried:
S 0
S,sineS 0
isobtainedfrom S bya sequene of-preservingsteps.
Thereareno onstraints
x
y
,sineno onstant upperboundsappearintheinitialonstraints
and noneareaddedbytherules.
Allonstraints in S 0
are of the form ,
x
,or
x
2 ) E
1 E
2
. To see this, note the
previouspoint and thatall other formsofleft-hand sidesareeliminatedbytherules.
Theproedureterminates,beauseonstraintsontheright-handsidesoftherulesinvolveonlypairs
of subexpressionsof theoriginalsystem. There areonlynitelymanysuh pairs,so eventuallyno
new indutive onstraints an be added. To help detet when all indutive onstraints have been
added it is suÆient to apply the transitive rule (4) one only for eah pair of indutive upper
and lower bounds on a variable. With that restrition the algorithm terminates exatly when no
rulesapply. (Note thatrules (3)and (4) annot getinto a loopbeause is notan indutive
onstraint.)
The lastpointan beusedtoperformomplexityanalysisofthealgorithm. Ifthesizeoftheoriginal
systemofonstraintsprintedasastring isn,thenthesizeof thenalsystemmaybe O(n 2
)withO(n 2
)
rules is O(n 2
). For Rule 4, a variable may have O(n) upperand lower bounds. Forming all pairs of
upper and lower bounds for takes O(n 2
) time. Sine there may be O(n) variables the total ost is
O(n 3
). The ostof Rule5 an similarlybeshownto beO(n 3
),sothe total ostis O(n 3
).
It remainsto showthat therulesatually solve theonstraints. From the disussionabove we know
thatthere an be notriviallyinonsistent onstraintsof theform
x
y
wherex6=y. Thus, when the
algorithm terminatessuessfullyall onstraintsare indutive.
Index the variables
1
;
2
;::: . We say that a onstraint y
j
is a lower bound on
j
ify =
x or
y=
i
andi<j. Aonstraint
j
yisanupper boundon
j
ify =
x
ory=
i
andi<j. Nowdene
L
i
= [
fyjy
i 2S
0
isa lower boundon
i g
U
i
=
\
fyj
i
y2S 0
isan upperboundon
i g
The L
i
and the U
i
simply ombine all upperand lower bounds on variables into a singleupper and
lowerboundpervariable. Notethat theL
i and U
i
exlude any onditionalonstraints remaininginS 0
.
Lemma 5.6 The system V
i L
i
i U
i
is indutive.
Proof: Conditions (1) and (2) of Denition 5.2 are easily veried; for (2), simply note that eah
onstraint is indutive. For ondition(3), beause our domain is a set of onstants
x
the hierarhy of
D
i
's ollapses to D
0
= ; and D
1
= f
x
jx isa programvariableg. The ondition for indutiveness an
thenbe simplied:
81i
0
n:81i<i
0 :L
i
i U
i )L
i0 U
i0
The proofis by indutionon i
0
. Forthe base ase, there are no variables with indexlower than
1 , so
novariablesan appearinL
1 orU
1
. InadditionU
1
ontainsnoonditionalonstraintsoronstants(see
disussionabove). It followsthat U
1
= T
;,whih isthe entiredomain, soL
1 U
1
inanyassignment.
Fortheindutivease, letbeanassignment tothevariablesandassumethat(L
i
)(
i
)(U
i )
for all i <i
0
. Let l be a disjunt of L
i
0
and let u be any onjunt of U
i
0
. Then l u 2 S 0
by Rule 4
orthe onstraint is a trivialone removed byRule 3. Assume lu is a non-trivial onstraint. If
either l or u is a variable its index is less than i
0
. Therefore, (l) (u) by the indutionhypothesis.
Sine land u were hosenarbitrarilyfrom L
i
0 andU
i
0
,itfollows thatL
i
0 U
i
0 .
2
LetS 000
beS 0
withremainingonditionalonstraintsremoved. Lemma5.6showsthatS 00
hassolutions
given by theequations
i
=L
i [(
i
\U
i )
wherethe
i
are freshvariables. Sineall operationsaremonotoni, 5
thesmallestofthese solutions is
i
=L
i
whereall
i
=0. This solutioniswhere
(
i )=f
x j
x
appears inL
i g
5
Alloperationsaremonotonibeausewedesignedtheonstraintlanguagetoavoidnegations. However,notethatthis
istheonlyplaemonotoniityisused,andthatitisusedtoshowtheexisteneofaleastsolution.
We laim that isa solutionofS 0
and therefore asolutionof S. ItsuÆes to show that
(
x
i
))(E
1 E
2 )
issatisedfortheonstraints
x
i )E
1 E
2 inS
0
butnotin S 00
. Assume forthesake of obtaining
a ontradition that
x
(
i
). The
x
appears in L
i
. But then the hypothesis of Rule 5 is satised,
ontraditing the assumptionthat S is losedunder the rewrite rules. We onludethat
x
6(
i ), so
theonstraint is satised.
5.4 Dataow Analysis
The dataow analysis disussed in Setion 4.1 allows general set omplement. Here we restrit our
attention to solving the spei form of onstraints arising in the live variable analysis, whih do not
make essentialuseof setomplement andare therefore muheasier to solve.
TheuniverseHisanitesetofonstantsa
1
;a
2
;:::;a
n
. ForanysetofonstantsA,thesetexpression
:(
S
A)an be writtenwithoutanegationas S
(H A). ReallthelivenessonstraintsfromSetion4.1.
[[S℄℄
in
S
use [([[S℄℄
out
\:S
def )
[[S℄℄
out
[
X2su(S) [[X℄℄
in
Theonlyexpressionnotalready treatedintheresolutionrulesof Figure1is\:A,whereA isaunion
of onstants. To handlethisase, we make use of theidentityX Y [Z X\:Z Y. Three ases
involving variablesand onstantson theleft-hand sideare treatedseparately:
S^
i
\A
j
S^
i
j
[:A i6=j
S^
i
\A
i
S
S^a
i
[A S^a\:A
i
The rst rule works either left-to-right or right-to-left. Only one diretion, however, an result in a
onstraint inindutiveform (i.e., withthehigher-numbered variable isolated). Thus,if i>j theruleis
appliedleft-to-right andifi<jtheruleisappliedright-to-left. Ifi=j theonstraintiseliminated(the
seond rule). Finally,if theleft-hand sideis a onstant a, thena\:A is formedto isolate thevariable
on the right-hand side(the thirdrule). The expressiona\:A is simpliedto either aif a6 A or0 if
aA.
Addingthese rules to those of Figure 1 to handlethenew expression\A is all that isrequired to
obtainan eetive algorithm. The proofof Lemma 5.6an be applied to this extensionby notingthat
thenewrulesputonstraintsinaformsatisfyingondition(2)of Denition5.2,and thattheproofthat
onditions(1) and(3) aresatisedisunhanged.
5.5 Simple Type Inferene
The onstraints for simple type inferene introdue one additional form of expression E
1
! E
2 . The
orrespondingresolution ruleis well-known:
E
1
!E
2 E
3
!E
4
E
3 E
1
^E
2 E
4
(6)
on theright-handside(see thedisussioninSetion3.2). Thisruleanbe ombined withthepreeding
onesto give amethod forsolvingthetypingonstraints. Resolutionof theonstraintsisagain inO(n 3
)
time.
The justiation for thisrule is outlinedin Setion3.2.1. A fullformalization requires onsiderable
additionalmahineryfrom denotational semantisand isoutside thesope of thispaper.
6 Disussion
We now turn to the relationship of onstraint-based analysis to other approahes to program analysis
and its plaein thetheoryof abstrat interpretation. The aeptedintelletualframework fordesigning
and justifyingprogram analysisalgorithmsis abstrat interpretation, dueto Cousotand Cousot [CC77℄.
Abstrat interpretation treats a program analysis as a sound approximation to the exat meaning of a
program. More preisely,an abstrat interpretation givesa non-standardinterpretation of theprogram
thatisonsistentwiththestandardinterpretation. Let(D;
D
)and(A;
A
)bepartiallyordereddomains
and let :D!A and :A!D be funtions thatform aGalois onnetion:
8d2D;a2A (d)
A
a,d
D (a)
Then (d)isthe abstration of d and (a) istheonretization ofa.
By dening the abstrat domain A and expliit mappings and it beomes possible to state
preiselywhat itmeansforan abstration of aprogram to be orret. For example,let P bea program
withstandardsemantis:Program!D!D. Letbeaprogramanalysis(anabstratinterpretation)
withfuntionality:Program!A!A. The isa sound abstrationifit satises:
8x2D:(P x)
D
( P (x))
Thus,theabstration (P) onservativelymodelsthebehaviorof P.
There is onfusioninthe literatureoverthe meaningof theterm \abstrat interpretation," whihis
usedatleasttomeaneitherasemantiframeworkforreasoningaboutprogramanalysis(skethedabove)
ora partiular set of tehniques foronstruting program analyses. The author prefers to use the term
to refer to the semanti framework only. Given that meaning, abstrat interpretation provides a lear,
well-dened framework for provingthat a program analysis is orret. We are unaware of any program
analysis that annot be explained in this framework, 6
inluding onstraints, although we have left the
abstration andonretization funtionsimpliit inourexamples.
Program analysis is tehnially diÆult and at the same time new problems typially bear some
resemblanetoolder,betterunderstoodproblems. Hene,thereislittleenthusiasmforinventingprogram
analyses from rst priniples in every instane, and people have naturally developed sets of tehniques
that an be reused. A few of these paradigmshave developed largefollowings. We disuss three: nite
lattie methods, typeinferene, andonstraints.
6.1 Finite Lattie Methods
One of themost popularparadigms appeared in the Cousots' seminal paper on abstrat interpretation
[CC77℄. Programanalysesinthisstylearevariationsonatheme. A niteabstratdomainAisdesigned
6
Widening/narrowinganbedenedwithoutreferenetoabstration(see[CC92℄). However,whenusedonanabstrat
domainthereareassoiatedabstrationandonretizationfuntions.
followingform
x
1
=
1
(X) ::: x
n
=
n (X)
whereX =fx
1
;:::;x
n
gisasetofvariablesandeah
i
isamonotonifuntionwithsignatureA jXj
!A.
Itiswell-knownthatageneriiterativexedpointalgorithmomputestheleastsolutionofsuhequations
[CC77℄.
Giventhatoneandesignaorretanalysisinthisframework,theimplementationisstraightforward
and has two additional useful properties: rst, the omputed analysis is the best possible within the
hosen parameters (i.e., it is the least solutionof the equations) and seond, theanalysis is guaranteed
toterminate. AnalysesforCandFORTRANprogramsbasedondataowequationsarelassiexamples
of thisprogramanalysis paradigm.
The ookbook reipe \nitedomains plusmonotonifuntions equals program analysis" hasproven
very popular, and there are an enormous number of appliations of this exellent idea; representative
examplesinlude[My80,JM86 ,Hud87, Wad87 ,HY88 , PBJ +
91 ℄. The paradigmhasbeomesopopular
that the term abstrat interpretation is often used to mean thisspei tehnique for program analysis
rather than a general semanti framework. Pedagogially this is undesirable, as it implies that the
semanti frameworkof abstrat interpretation annot be appliedto other paradigms.
6.2 Type Inferene
TheHindley/Milnertype inferenealgorithmhasreentlybeome popularasamodel forprogram anal-
ysesofadierentsort. Inthisapproah,aprogramanalysisisspeiedasanon-standardtypeinferene
system. Typially, suh systems are sets of dedutive inferene rules, with one rule for eah syntati
form in the programming language. It is worth noting that analyses in this style have been designed
thatprove allsorts of fatsaboutprograms, manyof whih have littleto do withtypes. Representative
examples inlude[Hen92 ,TT94℄.
Speifyinga programanalysis asaformal logiorrespondsnielywith theintuition thattherole of
program analysis is to prove fats about programs. However, the inferene rules alone normallydo not
speify an algorithm. If the logi an prove multiple fats about a program, it is neessary to speify
whih fat should be omputed by program analysis; that is, it is neessary to speify how the proof
searh is onduted. In pratie, designing the logi often is only the rst step and muh hard work
remainsinomingup withan algorithm and analyzingits omplexity. Forexample, implementationsof
Milner'stype systemare basedon solvingsystems ofequalityonstraintsusinguniation[Rob65 ℄.
6.3 Constraints
In1987WandwroteashortpaperontheHindley-Milnertypesysteminwhihheproposedto reastthe
usualtypingruleswithexpliitequalityonstraintsassideonditions,whihsimpliestheunderstanding
of Hindley-Milnertypeinferene algorithms[Wan87℄. Thispaperisapparentlythe rstto expliitlyput
forth the onstraint-based viewpoint (exepting Reynold's muh earlier paper [Rey69 ℄). Further devel-
opmenthasontinuedto emphasizetheproblemsof onstraintresolutionovertheproblemsofdedutive
inferene. Notethattheonstraint-basedanalysisnotationfortraditionaltypeinferene problemsdeftly
avoids usinginferenerulesat all (see Setion4.2)!
A thesisof thispaperis thatonstraint-basedanalysisuniesmuh ofthetraditionaldataowviews
and thetype infereneviewsof programanalysis. Tothedegree that dataow equationsarea proxy for
more generalabstrat interpretationsover nitelatties thereis onsiderableevideneforthisthesis. In
1 1 n n
anothersystem of onstraints to be solved. However, this levelof generalityobsures several important
dierenes.
What we refer to as nite lattie methods generally exploit three assumptions: rst, a partiular
solution (the least or the greatest) to the equations is desired; seond, the abstrat funtions an be
arbitrarymonotonifuntions;andthird,thatanitedomainofabstratvaluesgivessuÆientpreision
forall programs.
7
With respet to the rst point, in onstraint-based analysis a ommon (but not universal) view is
to ompute all solutions of the onstraints. For example, the onstraint resolution proedure for live
variable analysis inSetion 5 does notresemble the one in textbooks preiselybeause it omputes all,
ratherthantheleast, solutionoftheonstraints. Computingallsolutionsbeomesneessaryforseparate
analysisofprogramssplitarossmultipleles(wheretheleastsolutionoftheonstraintsforapartiular
lemayhavelittletodowiththeleastsolutionoftheentireprogram)andwhenthereisnoleastsolution
(e.g., inthepresene ofanti-monotoni onstrutors like funtionspae).
The seond important dierene lies in the nature of the abstrations hosen in nite lattie and
in onstraint-based analyses. All ommonly used, and very nearly all proposed, nite lattie methods
are either forwards (information ows from inputs to outputs) or bakwards (information ows from
outputs bak towards inputs; live variable analysis is an example). The dataow analyses tend to use
abstratfuntionsto representfuntionvalues. Thus,informationan oweasilyonlyinthediretionof
the abstrat funtion, whih is either forwards orbakwards. Constraint resolution, however, naturally
allowsinformationtoowineitherorbothdiretions,allowingforwardsand bakwards informationow
to beused inthesame analysis.
It is important to understand that allowing bidiretional informationow is not a unique property
of onstraints. For example,thetehnique ofhaoti iteration admits analysesthat areneitherforwards
norbakwards [CC78℄.
The third important dierene is that onstraints an easily work over innite domains, while the
nitelattie methodswork witha nitedomain. Finitedomains are a good t forsome problems (e.g.,
the two point domain ommonly used in stritness analysis [My80℄), but for others (e.g., partiularly
problemsinvolvingreursivedatastrutures)itismore naturaltoworkdiretlywithaninnitedomain.
Aproblemwithinnitedomains,however,isthatterminationoftheprogramanalysisisnotautomatially
guaranteed. Intheaseofsetonstraintstheterminationofonstraintresolutionisguaranteed;resolution
omputes aniterepresentationof thesolutionsof onstraints over aninnite domain.
The distintion betweeninnite and nitedomains is subtlerthanwe have indiated. If an analysis
terminatesforallprograms,thenlearlythereisnitestruture(i.e.,theniteomputation)regardlessof
thehoieofdomain. Thus,eveniftheintendeddomainisinnite,foreahprogramitshouldbepossible
to substitute a nitedomain that behaves indistinguishablyfrom the innite domain.
8
Essentially this
observation is used in [CC95 ℄ in showing the equivaleneof several dierent approahes to formulating
program analysesover niteand innitedomains.
Even if innite domains an be treated usingnite equivalents(as they must be if we wish to have
terminatingprogram analyses), that doesnot mean thatinnite domains serve no usefulrole. In many
ases an innite domain is simply the natural framework, while the equivalent nite domain may be
diÆult to disover and justify. In the ase of set onstraints, the nitedomain an be taken to be all
subsets of the onstraints of the initial system plus and those added by resolution rules. The full set
is only disovered by solving the onstraints. A similar perspetive is set forth in [CC92 ℄ in another
7
Orthatasuitablenitedomainanbederivedfromeahpartiularprogram.
8
Notethattheremaybeadierentnitedomainforeahpossibleinputprogram.