Thepurposeof the present work is to motivatethe use of onstraints forprogram analysis from the perspetive of the appliationsof thetheory

(1)

AlexanderAiken

EECS Department

University of California,Berkeley

Berkeley, CA94702-1776

aikens.berkeley.edu

1 Introdution

Programanalysisis onernedwithautomatiallyextrating informationfromprograms. Programanal-

ysis is a large topi, with a long history and many appliations, partiularly in optimizing ompilers

and softwareengineering tools. Asmight be expeted of anybroad area, there are a number of distint

approahesto program analysis.

This paper provides an overview of onstraint-based program analysis. While muh has been writ-

ten about onstraint-based program analysis in reent years, there is relatively little material to assist

outsiders who wish to learn something about the eld. Two survey papers over the omputational

omplexityof variousonstraint problemsthat ariseinprogram analysis[Aik94 , PP97℄. Thepurposeof

the present work is to motivatethe use of onstraints forprogram analysis from the perspetive of the

appliationsof thetheory.

Program analysis using onstraints is divisible into onstraint generation and onstraint resolution.

Constraintgenerationproduesonstraintsfromaprogramtextthatgiveadelarativespeiationofthe

desiredinformationabouttheprogram. Constraintresolution(i.e.,solvingtheonstraints)thenomputes

this desired information. In the author's view, the onstraint-based analysis paradigm is appealing for

three primaryreasons:

Constraints separate speiation from implementation. Constraint generation is thespeiation

of the analysis; onstraint resolution is the implementation. This division helps to organize and

simplifyunderstandingof programanalyses. Thesoundness ofan analysis anbe proven solelyon

thebasisoftheonstraintsystems used|thereisno needtoresortto reasoningaboutapartiular

algorithmforsolvingtheonstraints. Ontheotherhand,algorithmsforsolvinglassesofonstraint

problemsan be presentedand analyzed independent of any partiularprogram analysis. General

resultson solvingonstraint problemsprovide\o-the-shelf" tools forprogramanalysis designers.

Constraints yield natural speiations. Constraints are(usually) loal; that is,eah piee of pro-

gram syntax ontributes its own onstraints in isolation from the rest of the program. The on-

juntionof allloalonstraintsaptures globalpropertiesof theprogrambeing analyzed.

ThisworkwassupportedbyNSFNationalYoungInvestigatorawardCCR-9457812. Thisversioninludes orretions

suggestedbyManuelFahndrih.

(2)

analysis have arih theory thatan be exploited inimplementations. We shallonly touh on this

subjetinthispaper.

Werstbrieydisussthelonghistoryoftheuseofonstraintsinprogramanalysis,whihpredatesthe

urrentinterestintheareabymanyyears(Setion2). Theoverviewproperbeginswiththeintrodution

of set onstraints, a widely used onstraint formalism in program analysis and the one with whih the

author isbestaquainted(Setion 3).

The balane of the papershows that three lassialproblems|standard dataow equations, simple

typeinferene,andmonomorphilosureanalysis|anbeviewedasinstanesofsetonstraint problems

(Setion 4). Eah of these three very basi analyses have been developed by dierent ommunities of

peopleoverextended periodsof time, andto ourknowledgeno formalonnetion betweenthe problems

has beennoted previously inthe literature. Our mainaim in hoosingthese problems, however, is that

we assume mostreadersare familiarwith at leastone of them and therebyareaorded aneasy path to

appreiation of the onstraint-based analysis perspetive. We also present one simple variationof type

inferene suggestive of theexpressive powerprovidedbysetonstraints(see Setion4.3).

To give some insight into the algorithmi issues involved in a general onstraint-based analysis sys-

tem we give onstraint resolution algorithmsfor the onstraint systems arising from the three example

analyses. It isimportant to realizethatindierent appliationswe are interested indierent notionsof

onstraint solvability. Dependingon theappliation,we maybe interested in onlyknowinga partiular

solution(e.g., theleastsolution) orinalulatingall solutions.

Setonstraintsprovideoneofthemostgeneraldeidabletheoriesknownforonstraint-basedprogram

analysis,andtheessentialissuesofonstraint-basedanalysisanbeillustratedeasilyusingsetonstraints.

However, we do not wish to give the impression that set onstraints are the only useful onstraint

theory for program analysis. In addition, there are of ourse other approahes to program analysis not

based on onstraints. Other onstraint formalisms,altogether dierent approahes, as well asthe plae

of onstraint-based program analysis in the general theory of abstrat interpretation, are disussed in

Setion6.

2 History

Using onstraints in program analysis is not a new idea. The earliest example we are aware of is due

to Reynolds,whoproposedan analysisofLispprogramsbasedon theresolution ofinlusiononstraints

in 1969 [Rey69 ℄. Similar ideas (but based on grammars rather than onstraints) were developed inde-

pendently later by Jones and Muhnik[JM79℄. Dataow equationsand type equations, two examples

thatwe shallinvestigate ingreaterdepthinSetion4,alsohave alonghistory. Dataowequationsform

thebasis ofmost lassialalgorithmsforowanalysis usedin ompilersforproedurallanguages (most

notably Cand FORTRAN). Typeequations arethebasisof type infereneforfuntionallanguages and

fortemplate-style polymorphisminobjet-oriented languages.

While the idea of program analysis using onstraints is not new,there has been a dramati shift in

theresearhperspetiveinreent years. Formerly,eah oftheproblemareas desribedabovewasviewed

asaseparatelineofresearh,withitsowntehniques,problems,andterminology. Eortstohybridizeor

extendthesetehniquesmetwithonsiderablediÆulty,atleastinpartbeauseitwasunknownwhether

theresultingonstraintproblemsouldbesolved. Todayitisunderstoodthattheseproblemsarerelated,

and that muh an be gained by viewing the problemsas instanes of a more general setting. In fat,

tehniques from eah of the lassial algorithms may be ombined quite freely to reate new program

(3)

To make theadvantages of the onstraint perspetive onrete, we useanother lassialproblem for

illustration. Mostompilersperformregister alloation to assignmahineregisters to programvariables.

Consider thefollowingfragment of imperativeode, where programvariables arenamed a,b,,and so

forth:

a := + d

e := a + b

f := e - 1

print(f)

A valid register assignment is a mapping from variable names to register names that preserves pro-

gram semantis. If theregister namesare r1, r2, r3, ..., then theprogram underone validregister

assignmentmay be:

r1 := r2 + r3

r4 := r1 + r5

r1 := r4 - 1

print(r1)

The diÆulty in register alloation is that there are usually more program variables than there are

registers tohold them. Intheexampleabove, sixvariablesaremapped into ve registers,withvariables

aandfsharingregisterr1. Ingeneral, avalidregisteralloationmaynoteven existforagivenprogram.

Inthisase, thenumberofvariablesintheprogramanbereduedbyspillingsomevariablesbyinserting

ode to save and restorethese variablesto and from mainmemory.

The register alloation problemwas already reognized in the FORTRAN I ompiler in the 1950's,

but the solution tehniques were ad ho and not entirely eetive. By the 1970's it was realized that

theweakness ofontemporary registeralloationwas a limitingfator inthedevelopment of optimizing

ompilers. A breakthroughame inthelate 1970's when Chaitinproposedaregisteralloationheuristi

based on graph oloring[CAC +

81 ℄. The signiane of the ontributionan be judged bythe fatthat

this tehnique was the subjet of one of the rst software patents. Chaitin's insight was to formulate

registeralloationas aonstraint problem.

A variable x is said to be live at a program point p ifx is referred to at some program point later

in the exeution ordering than p with no intervening assignment to x. Otherwise x is said to be dead.

Consideran assignmentstatement y:=:::. Abasiobservationaboutregister alloationis

If variable xis live when variable y is assigned, then x and y annot be held in the same

register.

Intheexampleabove,wehaveimpliitlyassumedthataisdeadatthepointwherefisassigned,allowing

reuseof a'sregisterto holdthe value off.

This observation suggests thefollowing natural onstraint problem. Let Reg:Variables ! Registers

bea registerassignment. The onstraintson Regare

Reg(x)6=Reg (y),xis live whereyisassigned

This formulation neatly aptures the onstraints under whih a register assignment is valid. The next

problemisto omputeregisterassignments. Theonstraintsnaturally speifyagraphwithonenode for

(4)

eah node of the graph an be assigned a olordierent from the olorof all of its neighbors insuh a

waythat nomore than k olors areused. Findinga registerassignment withk registersis equivalent to

ndingak oloringoftheonstraintgraph.

By thetimeof Chaitin'swork,itwasalreadyknownthatgrapholoringis anNP-ompleteproblem,

and therefore that eÆient exat solutions were very unlikelyto be found. Chaitin proposed a simple

heuristifor oloringthe graphbased onanotherobservation:

Ifa node xhas fewerthan k inident edges, then thegraphisk-olorableifand onlyifthe

graph obtained by removing x andits edges is k-olorable.

That is, if x has fewer than k neighbors, then there is always a olor for x, no matter how the rest of

the graphis olored. In ases wherethe heuristi fails to olorthe entiregraph (i.e., a point is reahed

whereall nodeshave k ormore neighbors) itisneessaryto hooseavariable tospill. While subsequent

work extends the heuristisfor oloringand spilling, grapholoringremains thebest framework known

forregisteralloationafter nearly 20 years.

This ratheroldexample illustratesall of theadvantages ofusingonstraint formulationsinprogram

analysis. The onstraint formulation as inequalities separates the speiation of the problemfrom its

implementation, and most importantlygives a globalharaterization of the onditions to be satised.

The abstrat onstraint problem, now free of the details of the partiular program and programming

language, an then be addressed by appropriate tehniques, in thisase graph oloring. Note that the

onstraintresolutionalgorithmproeedsinamannerthathasnodiretrelationshiptoprogramstruture,

and that if one were to atually view the sequene of alloationdeisions made by the greedy oloring

heuristi it would jump around from point to point in the program with no apparent pattern. If we

were to attempt formulating diretly an algorithm that was dened, e.g., byindution on the program

syntax,itisunlikelywewouldarriveat somethingaseetiveasonverting theproblemto a onstraint

representation.

Thereadermayndregisteralloationheuristisapeuliarhoieforahistorialexampleofprogram

analysis. After all,graph oloringregister alloationis notusuallyeven regarded asa program analysis

problem,letaloneaonstraint-basedone. However,itislearthattheonstraintformulationwasentral

in developing the tehnique. Registeralloation is interesting for anotherreason. To our knowledge, it

is the only signiant appliation of negative onstraints (i.e., inequalities) to program analysis in the

literature.

3 Set Constraints

Thissetiongivesabriefoverviewofsetonstraintsandthestateofknowledgeonsetonstraintproblems.

In Setion 4 we illustrateonnetions between disparateprogram analysis problemsusing the language

of setonstraints.

Set onstraints desribe relationships between sets of terms. A set onstraint hasthe form X Y,

where X and Y are set expressions. Let C be a set of onstrutors and let V be a set of set-valued

variables. Eah 2 C has a xed arity a(); if a() = 0 then is a onstant. The set expressions are

denedbythefollowinggrammar:

E ::=j0jE

1 [E

2 jE

1

\E

2 j:E

1 j(E

1

;:::;E

a() )j

i

(E

1 )

In this grammar, is a variable (i.e., 2 V) and is a onstrutor (i.e., 2 C). In the standard

interpretation, setexpressionsdenote sets ofterms. A termis(t

1

;:::;t

a()

)where 2C and every t

i is

(5)

H. An assignment is a mapping V ! 2 H

that assigns sets of terms to variables. The meaning of set

expressionsis given byextendingassignmentsfrom variables to setexpressionsasfollows:

(0) = ;

(E

1 [E

2

) = (E

1

)[(E

2 )

(E

1

\E

2

) = (E

1

)\(E

2 )

(:E

1

) = H (E

1 )

((E

1

;:::;E

n

)) = f(t

1

;:::;t

n )jt

i 2(E

i )g

( i

(E)) = ft

i j9(t

1

;:::;t

n

)2(E);1ing

Asystemofsetonstraintsisaniteonjuntionofonstraints V

i X

i Y

i

whereeahoftheX

i andY

i

isasetexpression. Asolutionofasystemofsetonstraintsisanassignment suhthat V

i (X

i

)(Y

i )

istrue. A systemof set onstraintsissatisable ifithasat least one solution.

The term \set onstraints"wasoinedbyHeintze and Jaar[HJ90℄,who were therst to reognize

and formalize set onstraints in their fullgenerality. It is a remarkable fat about manyset onstraint

problems that not only is it deidablewhether or not a system of onstraints has a solution, but that

all (potentially innitelymany) solutions an be given a nite representation. In their original paper,

Heintze and Jaar showed that a restrited lass of set onstraints ould be solved and the solutions

nitelypresented.

1

Anaturalandinterestingsublassofsetonstraintsexludesprojetionsbutinludesallotheropera-

tions. An algorithmthatexhibitsallsolutionsof suhonstraintsrst appearsin[AW92℄. Subsequently,

manyalternativeproofsofthisresultandonnetionstootherdisiplinesweredisovered,inludingtree

automata[GTT92 ℄andgraphtheory[AKVW93℄. Apartiularlyelegantresultshowsthatsetonstraints

withoutprojetions areequivalent to themonadi lassof prediatelogi[BGW93℄.

Inluding unrestritedprojetions ina omplete theoryturnsout to be a diÆultproblem. A series

of papers by a variety of authors show inreasingly powerful systems of onstraints to be deidable

[GTT93 , BGW93,CP94a , AKW95 ℄. Charatonik and Paholskinally show that thefull set onstraint

languageis deidablein[CP94b℄.

Showing deidability is, of ourse, a neessary rst step in obtainingpratial algorithms. Beyond

deidability,we would like eÆient algorithms and algorithmsthatompute niterepresentationsof so-

lutions. In these areas the state of knowledge is inomplete. Currently, the algorithms that ompute

nite representations of the solutions of set onstraints annot handle unrestrited projetions. Fur-

thermore, the omplexity of solving general set onstraints is high. Satisability of set onstraints is

NEXPTIME-omplete; infat, itremainsNEXPTIME-ompleteeven ifprojetions areeliminated.

The omplexity results strongly suggest that analyses based on solving set onstraints in their full

generality are infeasible. However, there are many very useful polynomial time fragments of the full

theory, andit isthese tratablesub-theoriesthat areourfousinthispaper.

3.1 Expressive Power

Fromthedenitionabove,itiseasytoseethatthesetexpressionsonsistonlyofelementarysetoperations

plus onstrutors|simply put, it is a set theory of terms. The onstraint language is rih enough,

1

Itisalsoworthnotingthatforsomevariationsofsetonstraints,inpartiularwiththeadditionoffuntionspaes, no

ompleteresolutionalgorithmisknownforthegeneralase.

(6)

makesset onstraintsa usefultoolforprogram analysis. Forexample, programminglanguagedatatype

failities provide \sums of produts" data types, whih means simply unions of (usually distint) data

type onstrutors. Allsuh datatypesan be expressedasset onstraints.

Let X=Y stand forthepair ofonstraints XY and Y X. Consider theonstraint

=ons(;)[nil

Ifonsandnilareinterpretedintheusualway,thenthesolutionofthisonstraintassignsto thesetof

alllistswithelementsdrawnfrom. Thisexamplealsoshowsthataspeialoperationforreursionisnot

requiredintheset expressionlanguage|reursionis obtainednaturallythroughreursive onstraints.

We have notsaid whether we mean ourlists above to be strit (as inmost languages) or non-strit

(as in lazy funtional languages). Set onstraints an be used for either, although dierent modelsare

required forstrit and non-strit onstrutors. In this paperwe wish to avoid mostof theomplexities

of disussingmodels,sowe simplyobserve thatfora non-strit onsthe followingidentityholds:

ons(X;Y)ons(X 0

;Y 0

),X X 0

^Y Y 0

For a strit ons one must naturally aount for stritness, namely that ons(0;Y) = 0 for all Y (and

similarlyfora 0intheseond position). Thustheidentityforastrit onsismore omplex:

ons(X;Y)ons(X 0

;Y 0

),(XX 0

^Y Y 0

)_X=0_Y =0

Itisbyapplyingequivalenessuhasthesethatsetonstraintsolverssolvesetonstraints(seeSetion5).

By hoosing the appropriate resolution rules either strit or non-strit onstrutors an be modeled

faithfully;infat, itispossibleto distinguishindividualargumentsofonstrutorsasstritornon-strit,

though we know of few appliations for suh generality. Beause of the disjuntion on the right-hand

side of the ,, it is in general more expensive to resolve onstraints involving strit onstrutors than

onstraintsusingonlynon-strit onstrutors.

The set of non-nil lists (with elements drawn from ) an be dened as = \:nil, where is

denedasabove. The set isusefulbeause itdesribestheproperdomainof thefuntionthatselets

therst element of alist;suh afuntionisundened foremptylists. Thisexamplealso illustratesthat

set onstraintsan desribepropersubsetsof standardsumsof produts datatypes.

A red-blak treeis abinary searhtree withthe followingproperties:

1. Every node iseitherred orblak.

2. Every leafis blak.

3. Every rednode hastwo blak hildren.

4. Every pathfrom theroot to a leafhasthe same numberof blak nodes.

Together these properties imply that a red-blak tree of n nodes has height at most 2log(n+1), so

red-blaktreesarewell-balanedtrees. Setonstraintsan desribeproperties(1)-(3) ofred-blaktrees.

In the following equations, the set desribessubtrees rooted at blak nodes and desribes subtrees

rootedat red nodes. Redand blak arebothbinary onstrutors:

= blak([;[)[blakleaf

= red(;)

(7)

thesolutionsof set onstraintsarealways desribablebyregular equations(seeSetion 5).

Thenal,admittedlyontrived,exampleshowsanon-trivialsystemofonstraintswheresomeworkis

requiredtoderivethesolutions. Considertheuniverseofthenaturalnumberswithoneunaryonstrutor

suand one nullaryonstrutor zero. Letthesystem ofonstraints be:

su():

^

su(:)

These onstraints say that if x 2 (resp. x 2 :) then su(x) 2 : (resp. su(x) 2 ). In other

words, these onstraints have two solutions, one where is the set of even natural numbers and one

where is theset ofoddnatural numbers. The solutionsare desribed bythefollowingequations:

=zero[su(su())

=su(zero)[su(su())

The two solutionsareinomparable; ingeneral, there isno leastsolutionof asystem ofset onstraints.

3.2 Extensions

Thereareextensions ofset onstraintsthathaveproven usefulinvariousappliations. The mostimpor-

tant extensions aresurveyed here.

3.2.1 Funtion Spae

Funtion spaes X !Y an be addedto the set expressions. In an appropriate model,the meaning of

X!Y is

X !Y =ffjx2X )f(x)2Yg

Note that semantially ! is not a labelled ross produt of the domain and the range; thus the term

semantis of set expressionsgiven above are notadequate to model funtionspaes. A suitabledomain

an be onstruted using standard tehniques of denotational semantis and, given suh a domain, set

onstraintresolutiontehniquesstillapply,althoughsofarasisknownadditionalrestritionsareneeded

on unionand intersetion to guarantee thattheonstraintsan besolved[AW93℄.

Thefuntionspaeonstrutoristherstexamplewehaveseenofaonstrutorthatisnotmonotoni.

2

Funtion spae is anti-monotoni in its rst argument and monotoni it its seond argument. That is,

thefollowinghold:

X !Y X !Y [Y 0

monotoni

X !Y X[X 0

!Y anti-monotoni

Peopleunfamiliarwiththetypetheoryoffuntionsoftenndthepropertyofanti-monotoniitysurprising.

Theexplanationis inthedenitionoffuntionspaeabove. Notetheimpliationintheset qualiation

\x 2 X ) f(x) 2 Y". Inreasing X strengthens the hypothesis, so fewer funtions f satisfy the

impliation and the resulting set is smaller. Inreasing Y weakens the onlusion, so more funtions f

satisfytheimpliation and theresultingset islarger. Funtionspaes areusedprimarilyintheanalysis

of funtionalprogramminglanguages [AW93,AWL94,AF95,FA97 , MW97,FFK +

96 ,FF97℄.

3

2

Afuntionf ismonotoniifwheneverxythenf(x)f(y).

3

It is alsopossible to dene analyses involving funtions that avoid anti-monotonionstrutors altogether, although

thesetehniquesassumetheentireprogramisavailabletobeanalyzedatone[Hei94 ,FF97 ℄.

(8)

ConditionalexpressionsY )X are equalto X ifY isnon-emptyand equalto 0 otherwise:

Y )X = (

0 ifY =0

X ifY 6=0

Conditional expressions are very useful for expressing onstraints on ow of ontrol in programs. For

example, onsiderthe followingasestatement on a boolean expression.

ase x of

true: y;

false: z;

esa

Wemaywishtoonstrutananalysisthatapturesthefatthattheresultofthisexpressionanbeyonly

ifxevaluates to trueand that theresult an bezonlyif xevaluates to false. Let[[℄℄:Expressions!

SetVariables be a funtionmapping a program phraseto a set variable orrespondingto theanalysis of

thatphraseinthe solutionsof theonstraints(thisnotation istaken from[PS91℄). Assumingthat true

and falsearesetonstrutoronstantswiththeobviousinterpretations, thenthedesiredonstraint for

theaseexpressionis

(([[x℄℄\true))[[y℄℄)[(([[x℄℄\false))[[z℄℄)[[ase x of true: y; false: z; esa℄℄

Itisworthwhilenotingthatfromthepointofviewofdeidability,onditionalexpressionsaddnothing

to set onstraintsasthey area speialase ofprojetions. To seethis, observethat

Y )X 1

((X;Y))

Here we relyon thefat thattheinterpretation ofonstrutors requires thatifY =0, then(X;Y)=0

foranyX. Ifone wishesto omputesolutions(andnotjustknowthat solutionsexist),thenitturnsout

thatfora languagewithoutexpliitprojetions butwithonditionalexpressions itispossibleto nitely

represent allsolutions ofthe onstraints[AWL94 ℄.

We shall sometimes nd it onvenient to allow onditional onstraints in addition to onditional

expressions. Aonditional onstraint hastheform

X )(Y Z)

and has the meaning that if X 6= 0 then Y Z must hold and otherwise there is no onstraint.

Conditionalexpressionsand onditional onstraintsare equivalentinthe sensethat

X )(Y Z)(X)Y)Z

4 Appliations

Thissetionpresentsappliationsofsetonstraintstothreelassialprogramanalysisproblems: dataow

analysis, type inferene, and losure analysis. We expet that at leastone of the hosen appliationsis

familiarto anyreaderwitha bakgroundinoneof themajorprogram analysisommunities. Weuseset

onstraintsas theommonlanguageinwhihtheanalysis problemsarepresented.

(9)

Classial dataow omputations forimperative languages inlude live variable analysis, reahing deni-

tions,and onstantpropagation,among others[ASU86 ℄. Thesealgorithmsareformalizedasthesolution

ofsystemsofonstraintsoverexpressionsbuiltfromsetsofonstants,setvariables,andthesetoperations:

E ::=a

1

j:::ja

n jjE

1

\E

2 jE

1 [E

2 j:E

1

In this grammar a

1

;:::;a

n

are the onstants (nullary onstrutors) and stands for a family of set

variables. The meaning of an expressionis a set of onstants. A system of onstraints is a onjuntion

of equalities V

i

= E

i

. We assume that eah variable appears on the left-hand side of at most one

equation.

For example, in a live variable analysis in a language suh as FORTRAN there is one onstant for

eah program variable. Theproblemis to ompute, foreah program statement S, thevariablesx that

maybe usedaftertheexeution ofS withoutanyintervening assignmentsto x. Forbrevitywe onsider

onlythe ase where S isan assignment statement; the formulation forother program onstruts is also

straightforward. Foreah assignment statement we needto knowtwo onstant sets:

S

def

isthe setof variablesdened(written)byS.

S

use

is thesetof variablesused(read)byS.

For example, in the statement x = x+y we have S

def

= x and S

use

= x[y. For eah statement S

therearetwosetvariables[[S℄℄

in

and[[S℄℄

out

,orrespondingtothesetofvariablesliveimmediatelybefore

and after S respetively. Letsu(S) bethestatements immediatelyafter S in programexeution. The

systemof onstraintsis then

[[S℄℄

in

= S

use [([[S℄℄

out

\:S

def )

[[S℄℄

out

=

[

X2su(S) [[X℄℄

in

Theseonstraints expresshow livevariablesare(or arenot)propagated fromone program statement to

another. Forexample,for thestatement x=x+ytherst onstraint is

[[S℄℄

in

=fx;yg[([[S℄℄

out

\:fxg)

whih isequivalent to

[[S℄℄

in

=fx;yg[[[S℄℄

out

Thereareafewsubtletiesinourformulationoflivevariableanalysisworthdisussing. First,notethe

optimizationoftheonstraintrepresentationintheimmediatelypreedinglines(i.e.,whereanintersetion

is eliminated from theright-hand sideof the equation). In the proess of solving the equations it may

be neessary to evaluate individual equations many times under dierent assignments to the variables.

Thus,applyingidentitiesto simplifyonstraintsan signiantlyimprove theperformaneof onstraint

resolution implementations. This example merelyhints at what transformations arepossible, and there

isa substantialliteratureon simplifyingsetonstraints [Pot96,TS96,FA96 , FF97 ,MW97℄.

(10)

dataow theory. Note that the set expression grammar above allows negation of arbitrary expressions

:E. Thestandardproofthatdataowequationshave solutionsrequiresthatalloperatorsbemonotoni,

whih : learly is not. To ahieve monotoniity, set omplement is restrited to statially known sets

(i.e., set expressionswithoutvariables) inwhih ase theright-hand sidesof equationsare monotone in

allvariables. Thisrestritionisnotstritlyrequired|theonstraintspresented(with:)an besolvedas

theyareaspeialaseofmore generalsetonstraintsforwhihresolutionalgorithmsareknown[AW92 ℄.

There are reasons, however, to prefer restrited set omplement in dataow analysis. First, adding

general omplement raises theomputational omplexity signiantly (see disussionat the endof this

setion). Seond,indataowanalysis we usuallyareinterested ina bestsolution, eithertheleast orthe

greatest. A unique best solution need not exist ifset omplement is unrestrited. For the purposes of

dataow analysis,we shall assumesimply thatnegation isused ina suh away thatset expressionsare

monotoneinall variables.

Forlive variable analysis it is the leastsolution that is desired. In thisase, the followinginlusion

onstraintsare equivalent:

[[S℄℄

in

S

use [([[S℄℄

out

\:S

def )

[[S℄℄

out

[

X2su(S) [[X℄℄

in

As a useful exerise in manipulating onstraints we now show that these inlusions have the same

least solutionasthe equalities. (Solution isleast iffor any other solution 0

, we have (x) 0

(x) for

allx.) Beauseequalityimpliesinlusion,itfollowsthatevery solutionof theequalitiesisalso asolution

of the inlusions. Therefore, it suÆes to show that the inlusions have a least solution that is also a

solutionofthe equations.

As a rst step, note that the onstraints always have a solution

i

= fa

1

;:::;a

n

g (the set of all

onstants). Every inlusion onstraint issatisedbeausethe left-handsideisthe largestpossibleset.

Let

1 and

2

beanysolutionsoftheinlusionsandlet

3

()=

1 ()\

2

(). Nowforeveryinlusion

onstraint E we have

1

()

1

(E)

3 (E)

2

()

2

(E)

3 (E)

wherethelast stepof bothlines followsbymonotoniity. It follows that

1

()\

2

()=

3

()

3 (E)

so

3

is also asolution of theinlusions. Sine there alwaysexists a solution, solutionsare losedunder

intersetion,andthereareonlynitelymanysolutions(beausethedomainisniteandthereareanite

numberofvariables),there mustbe aleast solution.

Letbetheleastsolutionoftheinlusionsand assumeforthesakeofaontradition thatitisnota

solutionoftheequalities. ThenthereisaonstraintE suhthat()(E). Let 0

=[ (E)℄.

Nowwehave

()

0

()=(E) 0

(E)

bymonotoniity. Forany other onstraint 0

E 0

we know 6=

0

(reallevery variable appears in at

mostone left-hand side),and we have

( 0

)= 0

( 0

) 0

(E 0

)

(11)

where the last again follows by monotoniity. Thus, is a solution smaller than , a ontradition.

We onludethat isa solutionoftheequalities.

Dataowequationsareaspeialaseofsetonstraintswheretheonlyonstrutorsareonstants, the

left-hand sideof an equation is always a variable,and setomplement is restrited. The deidabilityof

theseequalityonstraintsfollowsimmediatelyfromthedeidabilityofsetonstraints. Moreinterestingly,

though,thedeidabilityofextensionsalsofollowsimmediately. Asnotedabove,unrestritedomplement

anbeaddedandallsolutionsarestillomputable,althoughtheomputationalomplexityinreasesfrom

polynomialtimeto NP-omplete [AKVW93℄.

Two other set onstraint extensions to dataow analysis are partiularly useful. The rst is the

addition of onditional expressions X ) Y. As noted earlier, onditional expressions an be used to

modelontrolow, whihomplementstheemphasisondataowin(aptlynamed)dataowanalysis. A

goodexampleoftheombinationofthesefeaturesisfoundin[Hei94 ,AFS98 ℄. Theseondextensionisthe

abilityto perform dataowanalysis of data strutures byinluding non-atomionstrutors. Set-based

analysis isa anonialexample ofa systemthat exploitsthisfeatureof setonstraints [Hei92 ,Hei94 ℄.

Finally, the algorithm given by the onstraint resolution rules is unlikely to be as eÆient as the

standardalgorithmsforlivevariable analysis. Theulpritis theruleforaddingtransitive onstraints

E

1

^E

2 E

1

^E

2

^E

1 E

2

whih addsnew onstraintsbetween variables ) ,somethingthat pratial implemen-

tationsforthisproblemdonotdo. ToahieveanalgorithmwitheÆienyakintothoseusedinpratie,

we an modifythe rulefortransitive onstraints to propagate only onstants inlower boundsto upper

bounds:

a^ Ea^E^aE

It is easy to show that this rule makes the least solution expliit; eah variable is assigned the set of

onstants appearingin itslowerbound.

4.2 Simple Type Inferene

Type inferene is a entral omponent of statially typed funtional languages. The essene of the

inferene algorithmisto generatea systemof typeonstraintsfromtheprogram text. Iftheonstraints

aresolvablethentheprogram istypableand thetypesofprogramphrasesareexhibitedbythesolutions

of theonstraints.

Forourpurposes thepurelambda alulussuÆes astheprogramminglanguage:

e::=xjx:e

1 je

1 e

2

For simpliity,we assume that variables in an expression are renamed as neessary so that all lambda

boundvariablesaredistint. Forasimple(thatis,notpolymorphi)typesystem, theexpressionsof the

onstraint languageare

E::=jE

1

!E

2

where! is an inxbinary type onstrutor. Constraint systemsare onjuntionsof equations V

i E

i1

=

E

i2

. As disussed in Setion 3.2.1, the term model presented in Setion 3 is inadequate for funtion

spaes, butadequatemodelsdo exist.

There are many equivalent ways to speify simple type inferene. One whih is lose to atual

implementations of type inferene algorithms uses systems of type equations. As before, we use [[e℄℄ to

stand foratype variable assoiatedwith e.

(12)

[[e

1

℄℄ = [[e

2

℄℄![[e

1 e

2

℄℄

This formulation is equivalent to the standard one whih uses inferene rules and is well-known

[Wan87℄. Under theserules itiseasy to verifythetypesof thefollowingexamples:

x:x :

x

!

x

z:y:z :

z

!(

y

!

z )

(z:y:z)x:x :

y

!(

x

!

x )

f:x:f(f(x)) : (

x

!

x )!

x

!

x

Dependingon whether niteorinnite solutionsaredesired,theonstraints aresolved usingrespe-

tivelyuniationorirularuniation. Ifirularuniationisused,thenevery lambdaexpressionhas

a type. (To see this, notethat both equationsan be solved byassigningevery expressionthereursive

type =!.) Notevery expression hasa type usingordinaryuniation. Ofourse, an alternative

proof of deidabilityis to observe that these areset onstraints. Note, however, thatjust asin thease

of uniationanourshekisrequired ifonlynitesolutions aredesired.

4.3 A Variation

One again we an obtaingeneralizations of thefamiliar theory. Forexample, by generalizing terms to

sets we an denethefollowinggrammarfortypes:

E =jE

1

!E

2 jE

1

\E

2 jE

1 [E

2 j0

Wereasttheonstraintstouseinlusioninsteadofequalityandallowsolutionsto beexpressedinterms

of themore expressive types:

[[x:e℄℄ [[x℄℄![[e℄℄

[[e

1

℄℄ [[e

2

℄℄![[e

1 e

2

℄℄

The rst onstraint says simply that the type of x:e must inlude all the funtions of type [[x℄℄! [[e℄℄.

To understand theseond onstraint,note that fortheonstraints to have anysolutions [[e

1

℄℄ must be a

set offuntions. Assume [[e

1

℄℄=X!Y forsome X and Y. We thenhave

[[e

1

℄℄=X!Y [[e

2

℄℄![[e

1 e

2

℄℄

whih implies,usingtheanti-monotoniityof thedomainand monotoniityof therange,that

[[e

2

℄℄X^Y [[e

1 e

2

℄℄

Inotherwords,thedomainX ofe

1

mustaept thetypeoftheargument [[e

2

℄℄,and thetype oftheresult

[[e

1 e

2

℄℄must be at leasttherange Y ofe

1 .

Under these inlusion onstraints many funtions have substantially more preise types than under

theoriginalequalityonstraints. Forexample, thefuntionthatappliesa funtiontwieto itsargument

hasthetype:

f:x:f(f(x)):((!)\( !))!(!)

(13)

provided that f has signatures ! and ! that an be omposed to produe a funtion of type

!.

The extended type system presented here is somewhat related to intersetion type disiplines. The

languageofintersetiontypesretains variables,funtionspaes, andintersetionsbetweentypes, butno

0 ortype union. However, most intersetion type disiplineshave muhmore general rulesforassigning

types to expressions than the onstraint generation rules we give above. As a result, even typehek-

ing for the natural intersetion type disipline is undeidable [CC90℄. Restrited, deidable versions of

intersetion type systemshave reeived onsiderableattention (see, e.g.,[CG92 ℄).

4.4 Closure Analysis

Astandard programanalysisforfuntionallanguagesislosureanalysis. Beause losureanalysisis not

aswell-known asdataowanalysis and type inferene,we rst desribea simple losure analysisbefore

disussingonstraints.

Intuitively, the losure analysis problem for the lambda alulus is to estimate the set of lambda

abstrationstowhih aprogramvariable anbeboundduringredution. Forexample, intheexpression

(x:x)y:y,thevariablexwillbeboundto anexpressionbeginningy,whiley willnotbeboundto any

expression. Closureanalysisisusedtoderiveanapproximationoftheontrol owgraphinahigherorder

funtional language. In a rst order language (suh as FORTRAN) the ontrol ow graph is statially

known|the order in whih expressions areevaluated is obviousfrom program syntax, and thisorder is

the struture from whih dataow analysis algorithms are built. In a higher order language, the order

in whih expressions are evaluated must be inferred and, in general, approximated. Closure analysis is

a well-known algorithm for approximating the ontrol-ow graph of a program and has been studied

extensively[Shi88,Ses91 ,PS91 , Pal95 ,NN97 ℄.

OurdevelopmentoflosureanalysisfollowsPalsberg's. Let[[e℄℄beavariableassoiatedwithexpression

e; thisvariable rangesoversets of lambda bindingsappearing intheomplete expression. Forexample,

for the expressionx:y:x the set of lambdas is f

x

;

y

g. For a xed lambda expression e, the losure

analysis istheleast solutionof a systemof onstraintsderivedfrom thesub-expressions ofe:

Sub-Expression Constraints

x:e

0

x

[[x:e

0

℄℄

e

1 e

2

foreveryx:e

3 in e

x [[e

1

℄℄)([[e

2

℄℄[[x℄℄ ^ [[e

3

℄℄[[e

1 e

2

℄℄)

Forthe expression(x:x)y:y,theonstraintsare

f

x

g[[x:x℄℄

f

y

g[[y:y℄℄

x

[[x:x℄℄)([[y:y℄℄[[x℄℄ ^ [[x℄℄[[(x:x)y:y℄℄)

y

[[x:x℄℄)([[y:y℄℄[[y℄℄ ^ [[y℄℄[[(x:x)y:y℄℄)

Solutionsof the onstraintsare orderedpointwise; i.e., 0

if and onlyif(x) 0

(x) forall x. It is

easy to verify thattheleastsolutionof theonstraintsis

[[x℄℄ = f

y g

[[y℄℄ = ;

(14)

x

[[y:y℄℄ = f

y g

[[(x:x)y:y℄℄ = f

y g

Our denitionoflosure analysis introdues two smallextensionsto theonstraintnotation we have

dened. Dene X) P to mean X\) P,whih isequivalent butstays withinoursyntax. Also,

deneX )(Y ^Z)to mean (X )Y)^(X )Z).

ThefatthatsetonstraintsofthisformanbesolvedfortheleastsolutionintimeO(n 3

)followsim-

mediatelyfrommoregeneralresultsonsolvingsystemsofsetonstraints[Hei94 ,AWL94 ℄(seeSetion5).

Historially, however, losure analysis has been investigated over a period of many years in isolation

from other tehniquesand, essentially, thefragment of set onstraints needed for theproblemhas been

disovered from rst priniples[Shi88,PS91 ℄. Set-based analysis an be viewed as a more general form

of losure analysis where, among other things, there is some abilityto trak theow ofontrol through

onditional tests[Hei94 ℄.

5 Solving Constraints

So farwe have worked at thelevel of speifyingthe onstraintsforpartiular program analysis applia-

tions. In this setion we disussomputing solutions of onstraints. The general strategy in onstraint

resolutionalgorithmsisalwaysthesame: Aninitialsystemofonstraintsisrepeatedlytransformedusing

simplerules untilthesystem is ina \solved form." We illustrate thisapproah usingthe three analysis

problemspresentedin Setion4.

We begin bydeningour notionof a solved form systemof onstraints. We showthat anyindutive

systemofonstraintshassolutions,andthatinfatallsolutionsareexpliitintheformoftheonstraints

(Setion 5.1). In the following subsetions we give algorithms for transforming the onstraint systems

developed inSetion4 into indutiveform.

5.1 Indutive Systems

We shalllimitour disussionto the followingexpressionlanguage,whihexludesprojetions.

E::=j0jE

1 [E

2 jE

1

\E

2 j:E

1 j(E

1

;:::;E

a() )

Muh ofthedevelopmentinthissetion follows [AW93 ℄.

Wemakeuseoftwopreviousresultsintheproofthatindutivesystems havesolutions. The rstisa

tehniquefortransforminginlusiononstraintstoanequivalentsystemofequations[AW92℄. Theseond

is thefat that systems of ontrative equations have uniquesolutions [MPS84 ℄. The onstraint-solving

algorithm presentedinSetion5 reduesan initialsystemof onstraints to aset ofsystems of indutive

onstraintsor reportsthat theinitial systemisinonsistent.

Todisussonstraintsolvingitisneessarytobefairlyspeiaboutthesemanti domain. We have

disussed two domains, a domain of terms and a domain that inludes funtion spaes. For simpliity,

we shall prove our results only for the term domain. We need the following denition. Let D

j be an

inreasing sequeneof sets thatontain larger terms(terms of greaterheight)asj inreases:

D

0

=;

D

j

=f(t

1

;:::;t

a() )jt

j 2D

j 1 g[D

j 1

(15)

for showing that an arbitrary system of inlusion onstraints over variables

1

;:::;

n

has a solution.

Initially,let

i

=0 for1in. At step j of theindution,assignsome terms of D

j to

1

,thento

2 ,

and so on, up to

n

. At eah step (j;i) of thisdoubleindutionover the terms ofD

j

and variables

i ,

we must ensurethat theonstraintsaresatised forall elements inD

j

. Ifthisan be donefor allpairs

(j;i) thenthesystemhas asolution.

In suh an indutiveproof,we must distinguishbetween variables insideof onstrutors (), whih

ontributetermsfromD

j 1

,andvariablesoutsideofonstrutors\(:::),whihontributetermsfrom

D

j .

Denition 5.1 Thetop-levelvariablesofX(denotedTLV(X))arethevariablesinXthatappearoutside

of aonstrutor. Formally,

TLV(

i

) = f

i g

TLV(0) = ;

TLV((:::)) = ;

TLV(E

1 [E

2

) = TLV(E

1

)[TLV(E

2 )

TLV(E

1

\E

2

) = TLV(E

1

)[TLV(E

2 )

TLV(:E

1

) = TLV(E

1 )

Top-levelvariablesare also alledthenon-expansive variables[MPS84 ℄.

Denition 5.2 A systemS ofonstraints isindutive ifthefollowingthree onditionshold:

1. S = V

1in L

i

i U

i

(i.e.,there isone lowerboundL

i

and upperboundU

i

pervariable

i )

2. TLV(L

i

)[TLV(U

i )f

1

;:::;

i 1

gfor1in

3. Forall i

0

=1;:::;nand integersj, thefollowingholdsinallassignments:

(8i=1;:::;i

0

1(L

i

\D

j

i

\D

j U

i

\D

j ) and

8i=i

0

;:::;n(L

i

\D

j 1

i

\D

j 1 U

i

\D

j 1 ))

)L

i

0

\D

j U

i

0

\D

j

Parts 1 and 2 are simple syntati properties. Part 3 is a more omplex semanti ondition. The

doubleindutionoutlinedabove foronstruting solutions is expressedin part 3,whihsays that if the

onstraintsaresatisable upto someleveli

0

and variable

j 1

,thentheonstraintsaresatisedforthe

next lower andupperboundpairin theindutionL

i

0

\D

j U

i

0

\D

j .

Denition5.2makesitpossibleto buildsolutionsindutivelyat levelD

j

byassigningvaluesinorder

to

1

;:::;

n

sinepart2 ensuresthatvariablesareonstrainedonlybylower-numberedvariablesat the

top level and part 3 ensures that

i

0

an be given a value between L

i

0

and U

i

0

. Systems that do not

satisfypart 3maynothave anysolutions (onsider,for example,system1

1 0).

Indutive systems are the output of our onstraint resolution proedures. That is, we will give

proedures (starting in Setion 5.3) for transforming an initial onstraint system into an equivalent

systeminindutiveform. Fortheseresolutionalgorithmsweanprovethatiftheoutputofthealgorithm

ontainsnotriviallyinonsistentonstraints(e.g., 10orint0)thenthesystemisinindutiveform

and therefore hassolutions.

We showthatindutivesystems have solutionsintwo steps: rst,we showthat an indutivesystem

isequivalent to a systemof equations;we thenshow thattheequationsalways have solutions.

(16)

1 1 n n i

side)is asading ifTLV(E

i )\f

i

;:::;

n g=;.

Theorem 5.4 Let S = V

i L

i

i U

i

be an indutive system ofonstraints. Then S is equivalent to

theasading equations

i

=L

i [ (

i

\U

i

) wherethe

i

are freshvariables.

Proof: Assumethat L

i

i U

i

and let

i

=

i . Then

i

= L

i [(

i

\U

i

) sineL

i

i U

i

= L

i [(

i

\U

i

) sine

i

=

i

Thus,everysolutionoftheonstraintsinduesasolutionoftheequations. Fortheotherdiretion,assume

that

i

= L

i [(

i

\U

i

) for some

i

. Clearly, L

i

. To show

i U

i

, we rst show forall i and j

that

i

\D

j U

i

\D

j

. Forthe sake of obtaininga ontradition, assume

i

\D

j 6U

i

\D

j

for some

i and j. Pikthe smallestsuh pair (j;i) orderedlexiographially. Note L

k

\D

l

k

\D

l U

k

\D

l

holdsif(k;l) <(j;i) byassumption and beause L

k a

k

. Sine the systemis indutive, itfollows that

L

i

\D

j U

i

\D

j

. Therefore

i

\D

j

= (L

i [(

i

\U

i ))\D

j

= (L

i

\D

j )[(

i

\U

i

\D

j )

U

i

\D

j

whih ontraditstheassumption. Thusforalli,

i

\D

j U

i

\D

j

forall j

)

i

\D

j U

i

forall j

)

i U

i

sine S

j D

j

=H

2

Theorem5.5showsthateveryhoie forthe

i

induesauniquesolutionto theasading equations.

Theorem 5.5 Let

1

= E

1

^:::^

n

= E

n

be a system of asading equations and let be any

assignment for the variables other than the f

1

;:::;

n

g. There is a unique extension 0

of that is a

solutionofthe equations.

Proof: Variable

i

anbeeliminatedfrom thetop-levelvariablesofeveryequationbysubstitutingE

i

for

i inE

i+1

throughE

n

. Let beanyremainingtop-levelfreevariable. Thendoesnotappearonthe

left-hand side of anyequation; we all suh variables free. For any xed assignment for thetop-level

freevariables,theequationsbeomeontrative (havenotop-levelvariables). Contrativeequationshave

uniquesolutions [MPS84 ℄. 2

5.2 A Digression on Set Complement

Set omplement is quite handy for expressing analyses, but in solutions of onstraints we often wish

to eliminate omplements so that we an see whih terms may belong to an expression E rather than

(17)

asading equations:

:0 = 1 where1= [

2C

(1;:::;1)

:(E

1 [E

2

) = :E

1

\:E

2

:(E

1

\E

2

) = :E

1 [:E

2

::E = E

:(E

1

;:::;E

a()

) = (:E

1

;1;:::;1)[:::[(1;:::;1;:E

a() )[

[

d2C fg

d(1;:::;1)

The equationintherstlinedenes1to betheHerbranduniverse. Foreahequation

i

=E

i reate

a new equation :

i

= :E

i

and simplify the right-hand side.

4

Now replae :

i

everywhere by a fresh

variable

i

. Thepreeding rulesand thistehnique foreliminating:

i

remove allnegations exepton a

freevariable . Anegation : annot beremoved, asthe arefreevariablesinthe onstraints.

Thereisanotherimportantissuewithsetomplement. Wehaveassumedthatthesetofonstrutors

isnite,andtherefore:(::: )anbewrittenasaboveusinganexpliitunionofallnon-terms. However,

in many appliations it is unreasonable to assume that we know all of the onstrutors. Typially the

set of onstrutors is determined by the program text. Beause a onstrutor dened in one part of a

programpotentiallyappearsinthesolutionsoftheonstraintsofanypartofthatprogram,assumingthat

all onstrutorsare knownat theoutsetmakesitimpossibleto analyzeprogram omponents separately.

It is notdiÆult to remove theassumption that all onstrutors are known. Assume nowthat C is

an inniteset ofonstrutors. We add thefollowingnew setexpressionwiththesemantis:

(NOT(f

1

;:::;

n

g))=fd(t

1

;:::;t

a(d) )jt

i

2H^d2C f

1

;:::;

n gg

IntuitivelyNOT is the set of all terms with a headonstrutor not inthe argument list. It is straight-

forwardto inludeNOT inthealgebraof setexpressions. Forexample:

:NOT(f

1

;:::;

n

g) =

1

(1;:::;1)[:::[

n

(1;:::;1)

:(E

1

;:::;E

n

) = (:E

1

;1;:::;1)[:::[(1;:::;1;:E

n

)[NOT(fg)

NOT(f

1

;:::;

n

g)\NOT(fd

1

;:::;d

m

g) = NOT(f

1

;:::;

n g[fd

1

;:::;d

m g)

1 = NOT(;)

Even inthease whereall onstrutors are known,NOT(fg)is amore eÆient representationthanan

expliitunionof all onstrutors exept.

5.3 Closure Analysis

We now turn to algorithms for solving onstraints. Constraint resolution is done by applying a set of

rewrite rules repeatedly untillosure. Forpedagogial reasons we present the rules a few at a time, as

needed foreah appliation. However, it is emphasizedthat indevelopingnew appliationsit is usually

unneessary to invent new rules. New analyses generally are expressed usingtheestablished mahinery

(theompletesetofrules),whihmeanstheanalysisdesigneran simplywritetheneessaryonstraints

and beassured theonstraintsan besolved.

4

This steponly works beause the asading equations are already ontrative inthe i. For example, startingwith

=andaddingomplementsgivesusanequationwithexatlythesamesolutions:=:.

(18)

S^E

1 [E

2 E

3

S^E

1 E

3

^E

2 E

3

(2)

S^ S (3)

S^E

1

^E

2

S^E

1

^E

2

^E

1 E

2

(4)

S^

x

2)E

1 E

2

^

x

S^E

1 E

2

^

x

(5)

Figure1: Rulesforsimplifyingonstraints.

Webeginwithlosureanalysisasithasthesimplestresolutionproedure. Expressionshave theform

E ::=

x

jj0jE

1 [E

2 j

x

)E

1

and a systemSof onstraintshas theform

S =

^

i E

i

WesaytwosystemsareequivalentS

1 S

2

iftheyhavethesamesetofsolutions. Figure1givesanumber

of equivalenesforlosure analysis onstraints. Itis easy to verifythat theseare infatequivalenes.

A onstraint

i

U (respetivelyL

i

) isindutive ifTLV(U) (respetivelyTLV(L)is asubsetof

f

0

;:::;

i 1

g. The algorithmforsolvingthelosure analysis onstraintsisas follows.

Readthe equivalenes asrewrite rulesgoingfromlefttoright. Therulesareapplied tothe

onstraint system repeatedly, in any order, untilno newindutive onstraintsan be added.

Let S 0

be the result of losing the system S under the rewrite rules. The following statements are

easilyveried:

S 0

S,sineS 0

isobtainedfrom S bya sequene of-preservingsteps.

Thereareno onstraints

x

y

,sineno onstant upperboundsappearintheinitialonstraints

and noneareaddedbytherules.

Allonstraints in S 0

are of the form ,

x

,or

x

2 ) E

1 E

2

. To see this, note the

previouspoint and thatall other formsofleft-hand sidesareeliminatedbytherules.

Theproedureterminates,beauseonstraintsontheright-handsidesoftherulesinvolveonlypairs

of subexpressionsof theoriginalsystem. There areonlynitelymanysuh pairs,so eventuallyno

new indutive onstraints an be added. To help detet when all indutive onstraints have been

added it is suÆient to apply the transitive rule (4) one only for eah pair of indutive upper

and lower bounds on a variable. With that restrition the algorithm terminates exatly when no

rulesapply. (Note thatrules (3)and (4) annot getinto a loopbeause is notan indutive

onstraint.)

The lastpointan beusedtoperformomplexityanalysisofthealgorithm. Ifthesizeoftheoriginal

systemofonstraintsprintedasastring isn,thenthesizeof thenalsystemmaybe O(n 2

)withO(n 2

)

(19)

rules is O(n 2

). For Rule 4, a variable may have O(n) upperand lower bounds. Forming all pairs of

upper and lower bounds for takes O(n 2

) time. Sine there may be O(n) variables the total ost is

O(n 3

). The ostof Rule5 an similarlybeshownto beO(n 3

),sothe total ostis O(n 3

).

It remainsto showthat therulesatually solve theonstraints. From the disussionabove we know

thatthere an be notriviallyinonsistent onstraintsof theform

x

y

wherex6=y. Thus, when the

algorithm terminatessuessfullyall onstraintsare indutive.

Index the variables

1

;

2

;::: . We say that a onstraint y

j

is a lower bound on

j

ify =

x or

y=

i

andi<j. Aonstraint

j

yisanupper boundon

j

ify =

x

ory=

i

andi<j. Nowdene

L

i

= [

fyjy

i 2S

0

isa lower boundon

i g

U

i

=

\

fyj

i

y2S 0

isan upperboundon

i g

The L

i

and the U

i

simply ombine all upperand lower bounds on variables into a singleupper and

lowerboundpervariable. Notethat theL

i and U

i

exlude any onditionalonstraints remaininginS 0

.

Lemma 5.6 The system V

i L

i

i U

i

is indutive.

Proof: Conditions (1) and (2) of Denition 5.2 are easily veried; for (2), simply note that eah

onstraint is indutive. For ondition(3), beause our domain is a set of onstants

x

the hierarhy of

D

i

's ollapses to D

0

= ; and D

1

= f

x

jx isa programvariableg. The ondition for indutiveness an

thenbe simplied:

81i

0

n:81i<i

0 :L

i

i U

i )L

i0 U

i0

The proofis by indutionon i

0

. Forthe base ase, there are no variables with indexlower than

1 , so

novariablesan appearinL

1 orU

1

. InadditionU

1

ontainsnoonditionalonstraintsoronstants(see

disussionabove). It followsthat U

1

= T

;,whih isthe entiredomain, soL

1 U

1

inanyassignment.

Fortheindutivease, letbeanassignment tothevariablesandassumethat(L

i

)(

i

)(U

i )

for all i <i

0

. Let l be a disjunt of L

i

0

and let u be any onjunt of U

i

0

. Then l u 2 S 0

by Rule 4

orthe onstraint is a trivialone removed byRule 3. Assume lu is a non-trivial onstraint. If

either l or u is a variable its index is less than i

0

. Therefore, (l) (u) by the indutionhypothesis.

Sine land u were hosenarbitrarilyfrom L

i

0 andU

i

0

,itfollows thatL

i

0 U

i

0 .

2

LetS 000

beS 0

withremainingonditionalonstraintsremoved. Lemma5.6showsthatS 00

hassolutions

given by theequations

i

=L

i [(

i

\U

i )

wherethe

i

are freshvariables. Sineall operationsaremonotoni, 5

thesmallestofthese solutions is

i

=L

i

whereall

i

=0. This solutioniswhere

(

i )=f

x j

x

appears inL

i g

5

Alloperationsaremonotonibeausewedesignedtheonstraintlanguagetoavoidnegations. However,notethatthis

istheonlyplaemonotoniityisused,andthatitisusedtoshowtheexisteneofaleastsolution.

(20)

We laim that isa solutionofS 0

and therefore asolutionof S. ItsuÆes to show that

(

x

i

))(E

1 E

2 )

issatisedfortheonstraints

x

i )E

1 E

2 inS

0

butnotin S 00

. Assume forthesake of obtaining

a ontradition that

x

(

i

). The

x

appears in L

i

. But then the hypothesis of Rule 5 is satised,

ontraditing the assumptionthat S is losedunder the rewrite rules. We onludethat

x

6(

i ), so

theonstraint is satised.

5.4 Dataow Analysis

The dataow analysis disussed in Setion 4.1 allows general set omplement. Here we restrit our

attention to solving the spei form of onstraints arising in the live variable analysis, whih do not

make essentialuseof setomplement andare therefore muheasier to solve.

TheuniverseHisanitesetofonstantsa

1

;a

2

;:::;a

n

. ForanysetofonstantsA,thesetexpression

:(

S

A)an be writtenwithoutanegationas S

(H A). ReallthelivenessonstraintsfromSetion4.1.

[[S℄℄

in

S

use [([[S℄℄

out

\:S

def )

[[S℄℄

out

[

X2su(S) [[X℄℄

in

Theonlyexpressionnotalready treatedintheresolutionrulesof Figure1is\:A,whereA isaunion

of onstants. To handlethisase, we make use of theidentityX Y [Z X\:Z Y. Three ases

involving variablesand onstantson theleft-hand sideare treatedseparately:

S^

i

\A

j

S^

i

j

[:A i6=j

S^

i

\A

i

S

S^a

i

[A S^a\:A

i

The rst rule works either left-to-right or right-to-left. Only one diretion, however, an result in a

onstraint inindutiveform (i.e., withthehigher-numbered variable isolated). Thus,if i>j theruleis

appliedleft-to-right andifi<jtheruleisappliedright-to-left. Ifi=j theonstraintiseliminated(the

seond rule). Finally,if theleft-hand sideis a onstant a, thena\:A is formedto isolate thevariable

on the right-hand side(the thirdrule). The expressiona\:A is simpliedto either aif a6 A or0 if

aA.

Addingthese rules to those of Figure 1 to handlethenew expression\A is all that isrequired to

obtainan eetive algorithm. The proofof Lemma 5.6an be applied to this extensionby notingthat

thenewrulesputonstraintsinaformsatisfyingondition(2)of Denition5.2,and thattheproofthat

onditions(1) and(3) aresatisedisunhanged.

5.5 Simple Type Inferene

The onstraints for simple type inferene introdue one additional form of expression E

1

! E

2 . The

orrespondingresolution ruleis well-known:

E

1

!E

2 E

3

!E

4

E

3 E

1

^E

2 E

4

(6)

(21)

on theright-handside(see thedisussioninSetion3.2). Thisruleanbe ombined withthepreeding

onesto give amethod forsolvingthetypingonstraints. Resolutionof theonstraintsisagain inO(n 3

)

time.

The justiation for thisrule is outlinedin Setion3.2.1. A fullformalization requires onsiderable

additionalmahineryfrom denotational semantisand isoutside thesope of thispaper.

6 Disussion

We now turn to the relationship of onstraint-based analysis to other approahes to program analysis

and its plaein thetheoryof abstrat interpretation. The aeptedintelletualframework fordesigning

and justifyingprogram analysisalgorithmsis abstrat interpretation, dueto Cousotand Cousot [CC77℄.

Abstrat interpretation treats a program analysis as a sound approximation to the exat meaning of a

program. More preisely,an abstrat interpretation givesa non-standardinterpretation of theprogram

thatisonsistentwiththestandardinterpretation. Let(D;

D

)and(A;

A

)bepartiallyordereddomains

and let :D!A and :A!D be funtions thatform aGalois onnetion:

8d2D;a2A (d)

A

a,d

D (a)

Then (d)isthe abstration of d and (a) istheonretization ofa.

By dening the abstrat domain A and expliit mappings and it beomes possible to state

preiselywhat itmeansforan abstration of aprogram to be orret. For example,let P bea program

withstandardsemantis:Program!D!D. Letbeaprogramanalysis(anabstratinterpretation)

withfuntionality:Program!A!A. The isa sound abstrationifit satises:

8x2D:(P x)

D

( P (x))

Thus,theabstration (P) onservativelymodelsthebehaviorof P.

There is onfusioninthe literatureoverthe meaningof theterm \abstrat interpretation," whihis

usedatleasttomeaneitherasemantiframeworkforreasoningaboutprogramanalysis(skethedabove)

ora partiular set of tehniques foronstruting program analyses. The author prefers to use the term

to refer to the semanti framework only. Given that meaning, abstrat interpretation provides a lear,

well-dened framework for provingthat a program analysis is orret. We are unaware of any program

analysis that annot be explained in this framework, 6

inluding onstraints, although we have left the

abstration andonretization funtionsimpliit inourexamples.

Program analysis is tehnially diÆult and at the same time new problems typially bear some

resemblanetoolder,betterunderstoodproblems. Hene,thereislittleenthusiasmforinventingprogram

analyses from rst priniples in every instane, and people have naturally developed sets of tehniques

that an be reused. A few of these paradigmshave developed largefollowings. We disuss three: nite

lattie methods, typeinferene, andonstraints.

6.1 Finite Lattie Methods

One of themost popularparadigms appeared in the Cousots' seminal paper on abstrat interpretation

[CC77℄. Programanalysesinthisstylearevariationsonatheme. A niteabstratdomainAisdesigned

6

Widening/narrowinganbedenedwithoutreferenetoabstration(see[CC92℄). However,whenusedonanabstrat

domainthereareassoiatedabstrationandonretizationfuntions.

(22)

followingform

x

1

=

1

(X) ::: x

n

=

n (X)

whereX =fx

1

;:::;x

n

gisasetofvariablesandeah

i

isamonotonifuntionwithsignatureA jXj

!A.

Itiswell-knownthatageneriiterativexedpointalgorithmomputestheleastsolutionofsuhequations

[CC77℄.

Giventhatoneandesignaorretanalysisinthisframework,theimplementationisstraightforward

and has two additional useful properties: rst, the omputed analysis is the best possible within the

hosen parameters (i.e., it is the least solutionof the equations) and seond, theanalysis is guaranteed

toterminate. AnalysesforCandFORTRANprogramsbasedondataowequationsarelassiexamples

of thisprogramanalysis paradigm.

The ookbook reipe \nitedomains plusmonotonifuntions equals program analysis" hasproven

very popular, and there are an enormous number of appliations of this exellent idea; representative

examplesinlude[My80,JM86 ,Hud87, Wad87 ,HY88 , PBJ +

91 ℄. The paradigmhasbeomesopopular

that the term abstrat interpretation is often used to mean thisspei tehnique for program analysis

rather than a general semanti framework. Pedagogially this is undesirable, as it implies that the

semanti frameworkof abstrat interpretation annot be appliedto other paradigms.

6.2 Type Inferene

TheHindley/Milnertype inferenealgorithmhasreentlybeome popularasamodel forprogram anal-

ysesofadierentsort. Inthisapproah,aprogramanalysisisspeiedasanon-standardtypeinferene

system. Typially, suh systems are sets of dedutive inferene rules, with one rule for eah syntati

form in the programming language. It is worth noting that analyses in this style have been designed

thatprove allsorts of fatsaboutprograms, manyof whih have littleto do withtypes. Representative

examples inlude[Hen92 ,TT94℄.

Speifyinga programanalysis asaformal logiorrespondsnielywith theintuition thattherole of

program analysis is to prove fats about programs. However, the inferene rules alone normallydo not

speify an algorithm. If the logi an prove multiple fats about a program, it is neessary to speify

whih fat should be omputed by program analysis; that is, it is neessary to speify how the proof

searh is onduted. In pratie, designing the logi often is only the rst step and muh hard work

remainsinomingup withan algorithm and analyzingits omplexity. Forexample, implementationsof

Milner'stype systemare basedon solvingsystems ofequalityonstraintsusinguniation[Rob65 ℄.

6.3 Constraints

In1987WandwroteashortpaperontheHindley-Milnertypesysteminwhihheproposedto reastthe

usualtypingruleswithexpliitequalityonstraintsassideonditions,whihsimpliestheunderstanding

of Hindley-Milnertypeinferene algorithms[Wan87℄. Thispaperisapparentlythe rstto expliitlyput

forth the onstraint-based viewpoint (exepting Reynold's muh earlier paper [Rey69 ℄). Further devel-

opmenthasontinuedto emphasizetheproblemsof onstraintresolutionovertheproblemsofdedutive

inferene. Notethattheonstraint-basedanalysisnotationfortraditionaltypeinferene problemsdeftly

avoids usinginferenerulesat all (see Setion4.2)!

A thesisof thispaperis thatonstraint-basedanalysisuniesmuh ofthetraditionaldataowviews

and thetype infereneviewsof programanalysis. Tothedegree that dataow equationsarea proxy for

more generalabstrat interpretationsover nitelatties thereis onsiderableevideneforthisthesis. In

(23)

1 1 n n

anothersystem of onstraints to be solved. However, this levelof generalityobsures several important

dierenes.

What we refer to as nite lattie methods generally exploit three assumptions: rst, a partiular

solution (the least or the greatest) to the equations is desired; seond, the abstrat funtions an be

arbitrarymonotonifuntions;andthird,thatanitedomainofabstratvaluesgivessuÆientpreision

forall programs.

7

With respet to the rst point, in onstraint-based analysis a ommon (but not universal) view is

to ompute all solutions of the onstraints. For example, the onstraint resolution proedure for live

variable analysis inSetion 5 does notresemble the one in textbooks preiselybeause it omputes all,

ratherthantheleast, solutionoftheonstraints. Computingallsolutionsbeomesneessaryforseparate

analysisofprogramssplitarossmultipleles(wheretheleastsolutionoftheonstraintsforapartiular

lemayhavelittletodowiththeleastsolutionoftheentireprogram)andwhenthereisnoleastsolution

(e.g., inthepresene ofanti-monotoni onstrutors like funtionspae).

The seond important dierene lies in the nature of the abstrations hosen in nite lattie and

in onstraint-based analyses. All ommonly used, and very nearly all proposed, nite lattie methods

are either forwards (information ows from inputs to outputs) or bakwards (information ows from

outputs bak towards inputs; live variable analysis is an example). The dataow analyses tend to use

abstratfuntionsto representfuntionvalues. Thus,informationan oweasilyonlyinthediretionof

the abstrat funtion, whih is either forwards orbakwards. Constraint resolution, however, naturally

allowsinformationtoowineitherorbothdiretions,allowingforwardsand bakwards informationow

to beused inthesame analysis.

It is important to understand that allowing bidiretional informationow is not a unique property

of onstraints. For example,thetehnique ofhaoti iteration admits analysesthat areneitherforwards

norbakwards [CC78℄.

The third important dierene is that onstraints an easily work over innite domains, while the

nitelattie methodswork witha nitedomain. Finitedomains are a good t forsome problems (e.g.,

the two point domain ommonly used in stritness analysis [My80℄), but for others (e.g., partiularly

problemsinvolvingreursivedatastrutures)itismore naturaltoworkdiretlywithaninnitedomain.

Aproblemwithinnitedomains,however,isthatterminationoftheprogramanalysisisnotautomatially

guaranteed. Intheaseofsetonstraintstheterminationofonstraintresolutionisguaranteed;resolution

omputes aniterepresentationof thesolutionsof onstraints over aninnite domain.

The distintion betweeninnite and nitedomains is subtlerthanwe have indiated. If an analysis

terminatesforallprograms,thenlearlythereisnitestruture(i.e.,theniteomputation)regardlessof

thehoieofdomain. Thus,eveniftheintendeddomainisinnite,foreahprogramitshouldbepossible

to substitute a nitedomain that behaves indistinguishablyfrom the innite domain.

8

Essentially this

observation is used in [CC95 ℄ in showing the equivaleneof several dierent approahes to formulating

program analysesover niteand innitedomains.

Even if innite domains an be treated usingnite equivalents(as they must be if we wish to have

terminatingprogram analyses), that doesnot mean thatinnite domains serve no usefulrole. In many

ases an innite domain is simply the natural framework, while the equivalent nite domain may be

diÆult to disover and justify. In the ase of set onstraints, the nitedomain an be taken to be all

subsets of the onstraints of the initial system plus and those added by resolution rules. The full set

is only disovered by solving the onstraints. A similar perspetive is set forth in [CC92 ℄ in another

7

Orthatasuitablenitedomainanbederivedfromeahpartiularprogram.

8

Notethattheremaybeadierentnitedomainforeahpossibleinputprogram.