Franois Pottier
Franois.Pottierinria.f r
August 25,2000
Abstrat
This paper oers a theoretial study of onstraint simpliation, a fundamental
issue forthedesignerofapratialtypeinferenesystemwithsubtyping.
Inthesimplerasewhereonstraintsareequations,asimpleisomorphismbetween
onstrained typeshemesandnitestateautomatayieldsaompleteonstraintsim-
pliation method. Using it as aguide for the intuition, we move onto the ase of
subtyping, and desribe several simpliationalgorithms. Althoughno longerom-
plete,theyareoneptuallysimple,eÆient,andveryeetiveinpratie.
Overall,thispapergivesaonisetheoretialaountofthetehniquesfoundatthe
oreofourtypeinferenesystem. Ourstudyisrestritedtotheasewhereonstraints
areinterpretedinanon-struturallattieofregularterms. Nevertheless,wehighlight
asmallnumberofgeneralideas,whihexplainouralgorithmsatahighlevelandmay
beappliable toavarietyofothersystems.
1 Introdution
1.1 Subtyping and type inferene
Inatypedprogramminglanguage,afuntion appliation(e
1 e
2
)islegalifandonlyifthere
existsatype
2
whihisbothavalidtypefortheargumente
2
andavaliddomaintypefor
thefuntione
1 .
In thesimply-typed-alulus, theset of allvalid typesofagiven(un-annotated) ex-
pressione hasaveryregular struture: itis either empty,orexatlythe set ofall substi-
tution instanes of a most general type. Then, inferring the (most general)type of an
expression redues to solving aset of equations between types[Wan87℄. The addition of
let-polymorphism,asdoneinML[Mil78℄,essentiallypreservesthisfat.
These systems have type instantiation as their only notion of type ompatibility. In
partiular, they view any two ground types as inompatible unless they are equal. For
instane, assume mahine integers and oating-point numbers are desribed by two base
types, namely int and real. Then, the appliation (fatx) is illegal if fat and x have
(most general)typesint ! intand real, respetively. This is agood point, sine it is
ertainlyaprogrammingerror. Ontheotherhand,iflog andnhave(most general)types
real!realand int, respetively, thenthe appliation (logn) is deemedillegal aswell.
Yet,beauseintegersare mathematiallyasubset ofreals, onemayatually wishfor this
termtobeaepted.
Tooveromethislimitation,Mithell[Mit84℄suggestsenrihingthesetypesystemswith
subtyping. Thisinvolvesintroduingapartialorderontypes,togetherwithanewtyping
Address: INRIARoquenourt,BP105,78153LeChesnayCedex,Frane. Thispaperistoappearin
Information&Computation.
rule, stating that if holds (read: if is a subtype of ) then everyexpression of
type has type 0
aswell. For instane, hoosing thestrit ordering int< realauses
(fatx)to remainill-typed,while (logn)beomeswell-typed, beausen:intnowimplies
n:real. Subtypingisnot,ingeneral,limitedtobasetypes: Cardelli[Car88℄equipsreord
typeswithanaturalsubtypingrelation,allowinginformationaboutanynumberofeldsto
bedisarded. In additionto itsintrinsi interest, suh a systemprovides apossiblebasis
forthestudyofobjet-orientedlanguages.
Systemsequippedwithsubtypinghaveaombinationoftypeinstantiationandsubtyping
as their notion of type ompatibility. As a result, the type inferene problem no longer
redues to solving a set of equations. Instead, it requires solving a set of inequalities,
usuallyalled onstraints [Mit84, Pal95℄. This proessistheoretiallystraightforward,but
ostly,beausetheeÆientuniationalgorithms developedto solveequations[JK90℄an
nolongerbeused.
Why, then, should we wish to perform type inferene? Would it not be suÆient to
requirethe programmer to supply type annotations, and merely hek their onsisteny?
Letusgivetworeasonswhytypeinfereneisuseful. First,itfreestheprogrammerfromthe
burdenofdelaringthetypeofeveryprogramvariable|atedioustaskinmanywidespread
languages|and allows him to naturally write polymorphi ode. Seond, type inferene
maybeviewedasasimplewayofdesribingprogram analyses[PO95℄, whose resultsmay
beused, forinstane,todriveompileroptimizations.
1.2 Simpliation
Ouraim,then,is tostudythetypeinfereneprobleminthepreseneofsubtyping, andto
ompareitwiththeoriginalproblem, wheresubtypingisreduedtoequality.
The onstraint system to be solved is the samein both ases; its size is linear in the
program size. (Thoughlet-polymorphismmay, in fat, auseit to growexponentially, it
is an aepted fat that it \usually" does not.) However, while equations an be solved
in quasi-lineartime, solvinginequalitiesbetween(non-atomi)termstypiallyrequires (at
least)ubi-timealgorithms[AW93,Pal95,MR00℄. Thus,aneÆieny problemappears.
Every uniation problem admits a most general solution. Thus, in the absene of
subtyping,everyprogramhasamostgeneraltype. Itisoftenompatandeasilyintelligible.
Ontheotherhand,manylassesofsubtypingproblemsdonothavemostgeneralsolutions.
Then,desribingthesetofallvalidtypesofagivenprogramrequiresprintingtheonstraint
systemitself, whih ofteninvolvesmanyauxiliarytypevariables. Thus, areadability issue
alsoarises.
To address these problems, it seems neessary to simplify systems of subtyping on-
straints,i.e. toreduethemtosmaller,equivalentsystems. Thistopihasreeivedontinued
attentionin thepastfew years[Aik94,AF96, AWP96,FFSA98,AFFS98, Fah99,EST95a,
TS96,FF96,FF97, Fla97, Pot96,Pot98a, Pot98b,Pot98℄. Indeed,designingareasonable
simpliationalgorithm isnoteasy. Itmust beorretandeÆient. Ideally, itshould also
beomplete, i.e. produe optimalresults. Unfortunately, ahievingompleteness involves
solving the onstraint entailment problem, whih may be muh more omplex than on-
straintsolving. Inourframework,forinstane, entailmenthasbeenshownPSPACE-hard,
but its deidability is still unsettled [HR98, NP99℄. Forthis reason, pratial onstraint
simpliationalgorithmsareofteninomplete.
1.3 Choies
Deningatype systemwith subtyping involvestwomain hoies. First, onemust hoose
aonstraintlogi, i.e. dene aonstraintlanguageand itsinterpretation within amodel.
therulesreduethetypingproblemtoaseriesofassertionsexpressedwithintheonstraint
logi. Thesetwohoies aremostlyorthogonal,aspointedoutby[OSW99℄.
Asfaras thersthoie isonerned,the arrayofpossibilitiesisextremely wide. The
modelmayhaveovarianttypeonstrutorsonly,oritmayhaveontravariantonstrutors
aswell. (Intheformerase,onstraintsystemsmayhavesmallestsolutions.) Itmayormay
nothavereursivetypes. (If present, theymaygivesmoothermathematial properties to
themodel,leadingtosimpleralgorithms.) Themodel,equippedwiththesubtypeordering,
may or may not form a lattie. (If it does, then moreaggressive simpliations beome
valid.) Typesmaybeinterpretedasideals[MPS86℄orasterms. (Theformerinterpretation
assignsmorepreisemeaningsto unionand intersetiontypes. Ontheother hand,it may
bemoreomplex;axiomssuhas ?=(?), where is any unarystrit typeonstrutor,
makeonstraint solving morediÆult.) When types are interpreted as terms, subtyping
maybe atomi(i.e. onlyonstanttype onstrutors may beomparable), strutural (i.e.
onlytypeonstrutorsofidentialaritymaybeomparable),ornon-strutural(eventype
onstrutors with dierent arities may be omparable). Constraints may be interpreted
withinaxedmodel,orwithin afamilythereof. (Iftheformer,thendeepersimpliations
are usually possible. On the other hand, user-extensible subtype hierarhies require the
latter.)
Changesin the onstraintlogi greatlyaet theomplexityof theresolution and en-
tailmentproblems (as well asthe formulation of the orresponding algorithms). For this
reason, we will fous on a single ase, while hoping that (some of) our methods may be
appliableto(some)otherlogis. Morespeially,wehoosetointerprettypesinthexed
model ofallregulartermsgenerated by?,! and>,with arities0,2and 0,respetively.
Subtypingisinterpretedinthemodelbyorderingtheseonstrutorsasgivenandviewing!
asaontra/o-varianttypeonstrutor. Thisyieldsanon-struturalsubtypingrelationship,
whih forms a lattie. Although this ase may seemverysimple, generalizingit to more
elaboratenon-struturaltermlattiesisstraightforward(seee.g.[Pot00a℄)andrequiresno
fundamentalhangestothetheoryortothealgorithms.
Theseond hoiedenitely haslessimpatonthesystemasawhole. Althoughmany
variantshaveappearedin theliterature,mostof themareverylose inspirit. Theideais
to extend the Hindley-Milner type disipline [Mil78℄ with onstraints, while keeping let-
polymorphism. PerhapsthemostelegantformalexpositionofthisideaisthesystemHM(X)
byOdersky, Sulzmannet al.[OSW99,SMZ99℄. Here,however,wewill useaset oftyping
rulesinspiredbyTrifonovandSmith[TS96℄, withafewtehnialmodiationstothetype
inferenerules. Thissomewhatunommon presentationallowsus to dealwith losed(i.e.
fully universally quantied) type shemes only, making aformal desription of onstraint
simpliation|theentraltopi ofthepresentpaper|easier.
1.4 Overview
Inthispaper,wepresentatypeinferenesystemwithsubtyping,designedwithonstraint
simpliationinmind. Itsinferenerulesarewrittensoastogenerateamenableonstraint
systems. We desribethree simpliation algorithms, designed to be used in ombination
withoneanother;theyaresimple,eÆientandeetive. Weemphasizetheparallelbetween
the ase of equality and that of subtyping, and showthat these algorithms are based, in
both ases, on the same broad ideas. In fat, in the ase of equality, their ombination
yields aomplete simpliationstrategy. Althoughit is nolonger ompletein the aseof
subtyping,webelieveitproduesgoodresultsin pratie.
Thispaperis laid out asfollows. Setion 2introduesthe neessarytheoretial bak-
ground,namelyasetofgroundtypesorderedbysubtyping,aorelanguage,asetoftyping
deterministialgorithm,whihmapsanexpressiontoaonstrainedtypesheme. Theon-
straints thus generated may be viewed, at will, asequations or assubtyping onstraints.
In the former ase, we obtain a typeinferene system lose to that of ML; in the latter,
a system equipped with the full power of subtyping. Setion 3 studies the simpler one,
andsuggestsaompletesimpliation method byborrowingoneptsfrom automatathe-
ory. This setion should help the readerform general intuitions about the struture and
behaviorofonstraints. Wehopetheseideasareappliableinotherontexts;inpartiular,
they may betransferred to ourmoreomplex system, whih isthe topi of setion 4. In
thissetion,whihforms thetheoretialbodyofthepaper,weformallydesribeandprove
severalonstraintsimpliationalgorithms,basedonthesameideas. Setion5showsthese
algorithmsatworkonasimpleexample. Setion6reviewsrelatedwork.
Thispaperborrowsideasfrom several existing works. Oneof itsnovelaspets istheir
seamlessintegration: wedesribealean,simpletheory,whihleadsdiretlytoaneÆient
implementation[Pot00b℄. Anotherontributionisintheareaofpresentation. First,thanks
to a arefullythought-out mathematiallayout, we are ableto presentour formal results
withalmostnoauxiliarysteps,andwithsubstantiallysmallerproofsthanin earlierworks.
Seond, we highlight the similarity of our methods with those appliable in the ase of
equalityonstraints;bydoingso,wehopetohelpthereadergrasptheessentialideasbehind
ouralgorithms. Thus,thispapermayonstituteagoodintrodutiontothetheoretialissues
behindonstraint-basedtypeinferene.
Beforebeginning our tehnial exposition, letus reall that the fous of this paper is
on onstraintsimpliation. Beause of this deision, several issues related to thedesign
of a onstraint-basedtype inferene systemhavebeen left aside. Among them, one may
mention ertainfundamental theoretialresults, suh astype safety; various implementa-
tiononerns,inludingeÆienymeasurements;extensionsoftheorelanguageneessary
to obtain a full-blown programming language; et. These issues are disussed at length
in [Pot98, Pot98b℄. Lastly, we do not address the issue of entailment, i.e. we do not
attempt to give an algorithm to deide whether two given type shemes are in the sub-
sumption relation. Indeed, we do not have a need for suh an algorithm, beause all of
thesimpliationalgorithmspresentedinthispaperprovablypreservethemeaningoftheir
input. Nevertheless, the entailment problem is losely linked to the issue of onstraint
simpliation;werefertheinterestedreaderto [AC93,KPS93,Pot98,HR98,NP99℄.
2 A onstraint-based type inferene system
2.1 Ground types
Groundtypesaretheregulartreesbuiltwiththeelementaryonstrutors?,>and!. They
are thesimplest kindof types,sinetheyare (possibly reursive) types withoutvariables.
Theyaremonomorphi;polymorphismshallbeintroduedlaterbyonsideringtypeshemes
whihdenotesetsofgroundtypes.
Denition1 Let the ground signature
g
onsist of ? and > with arity 0 and ! with
arity 2. A path p is a nite string of 0's and 1's, i.e. an element of f0;1g
. denotes
the empty path. The lengthof apath pisdenotedby jpj. Its parity(p) isthe number of
0's itontains, takenmodulo 2. A groundtree isa partialfuntion frompaths into
g ,
whose domain is non-empty and prex-losed, and suh that (p0) and (p1) are dened
i (p) = !. Given p 2 dom(), the subtree of rooted at p, written
jp
, is the tree
q7!(pq). Atreeis niteiitsdomainisnite. Atreeis regulariithasanitenumber
of subtrees. A groundtype isaregular groundtree. Wedenote the setof groundtypes by
0 and
1
are trees,
0
!
1
stands for the tree denedby ()=!, (0p)=
0 (p) and
(1p)=
1 (p).
Thesetofgroundtypesisequippedwithapartialorder,alledsubtyping.
Denition2 A family of orderings over ground types is dened indutively as follows.
First,
0
isuniformly true. Seond,for any k2N +
,
k +1
0
holds iatleastoneof the
followingistrue:
=?;
0
=>;
9
0
1
0
0
0
1 =
0
!
1 ,
0
= 0
0
! 0
1 ,
0
0
k
0 and
1
k
0
1 .
Subtyping,denoted by, isthe intersetion ofthese orderings.
(T;) forms a lattie. Its operators t and u an be dened in several ways, e.g. using
automata produts,niteapproximationsorax-pointtheorem. Buttheirdenition isof
littleinterestin itself,andweshallbeontentwiththefollowingharaterization.
Theorem1 The set of groundtypes T, equipped with the subtyping relation, is alattie.
We denote its least upper bound and greatest lower bound operators by t and u, respe-
tively. These operators are of ourse assoiative and ommutative. In addition, they are
haraterizedby thefollowing identities:
?t = ?u =?
>t => >u =
(
1
!
2 )t(
0
1
! 0
2 )=(
1 u
0
1 )!(
2 t
0
2 )
(
1
!
2 )u(
0
1
! 0
2 )=(
1 t
0
1 )!(
2 u
0
2 )
2.2 Types
Wewillsoondesribeourtypesystem,whihisalogiforderivingtypingjudgmentsabout
programs. Wewishthesystemtoenjoymostgeneraltypings: so,informallyspeaking,the
setofaprogram'sgroundtypesshould beexpressiblewithasingletypingjudgment. That
is, apossibly inniteset of possiblyinnite groundtypesshould be desribed by asingle
logialassertion|whih mustbenite. Toallowthis, wenowintroduetypes,whih may
ontaintypevariables. Usingreursiveonstraintsonvariables,anygivengroundtypean
be nitely desribed; in addition, quantiation over type variables allowsgiving a nite
desription of ertain innite sets of ground types. To sum up, type variables serve two
dierentpurposes: theyenodereursivestruture, andtheyallowpolymorphism.
Denition3 LetV be adenumerable set of typevariables,denotedby ,,et. The set
of types,denotedbyT,isdenedby
::=j?j>j !
Atype issaidtobe onstrutedi itisnotavariable.
Denition4 Agroundsubstitutionisatotalmappingfromtypevariables togroundtypes.
A renaming is a bijetion between two subsets of V. Ground substitutionsand renamings
arestraightforwardly extendedtotypes.
denotedbyfv +
() andfv (), aredenedby
fv +
() =fg fv () =?
fv +
(?) =? fv (?) =?
fv +
(>) =? fv (>) =?
fv +
(
0
!
1
) =fv (
0 )[fv
+
(
1
) fv (
0
!
1 ) =fv
+
(
0
)[fv (
1 )
The setof freevariables of,denotedbyfv(), isdenedby
fv()=fv +
()[fv ()
2.3 Constrained type shemes
Likethat ofML, ourtypesystemoerslet-polymorphism. Thus, typing judgmentsasso-
iateprogramsnotmerelywithtypes,but withtypeshemes.
A onstrainedtypeshemeisessentiallyatype|its body|where variablesare allowed
to assumearbitraryvalues, within thelimitsof ertainonstraints. Hene, atypesheme
represents a set of ground types, whih is obtained|roughly speaking|by applying all
solutionsoftheonstraintsto thebody.
Constraint-basedtypesystemshaveappeared inorder to dealwithsubtyping assump-
tionsin typingjudgments. However,theyanalso desribelassiequality-basedsystems,
suhasMLitself. Forthis reason,wewill givetwovariantsof ourtypesystem: onewhere
onstraintsareto beinterpretedasequations, and onewhere theytrulydenote subtyping
relationships. Theformerisofoursesimpler,butstillinteresting,beauseitpresentsmany
ommonpointswiththelatter,espeiallyintheareaofonstraintsimpliation,wherethe
samebroadoneptsapply. Studyingitrstwillallowustoidentify methodswhihgener-
allyapplytoallonstraint-basedsystems,asopposedtothosespeitoourinterpretation
ofsubtyping.
However, even in the simpler ase, our system exhibits a signiant departure from
ML,beause,followingTrifonovandSmith[TS96℄,wehooseaformulationwhere alltype
shemesarelosed,i.e. withnofreetypevariables.
This deisiongivesrisetoasystemwhere typeshemesare stand-alone: theirmeaning
doesnot depend onanyexternal assumptions. (Dening the denotationof atype sheme
with freetypevariableswouldrequiresupplyinganassignmentofgroundtypestothesefree
variables.) It alsoremovestheneedtomaintainaglobalonstraintset,onstrainingthose
variables whih are free in the environment, sine there are none. Furthermore, we will
notiethattwodistintbranhesofatypeinferenederivationnowsharenotypevariables.
Thesepropertiesleadtoasimpliation,andabetterunderstanding,ofthetheory,aswell
asto amorestraightforwardimplementation.
In ML, it is inorretto generalize overa typevariable if it appears free in the envi-
ronment. So, howan wehopeto be ableto universally quantify overallvariables? The
solutionistomovetheenvironmentintothetypeshemeitself. Thispresentationisknown
as-lifting,for itessentiallyamountsto pretendingthat weare dealingsolelywith losed
programterms. Itsfuntioning willbedetailedbythetypingrules(seesetion2.4). More
preisely,informationonerninglet-boundvariablesremainsstoredinsideanexternalen-
vironment,whileinformationabout-boundvariablesappearsinaontext whihispartof
typeshemes.
Denition6 Assumean ordering on groundtypes, whih anbehosen tobe=or.
Theforthomingdenitionsdependonthehoieon,soweendupdeningtwovariants
ofthetypesystem,basedeitheronequalityoronsubtyping.
Denition8 A ground ontext is a nite map from -identiers to ground types. The
ordering isextendedtogroundontextsasfollows:
AA 0
() 8x2dom(A 0
) x2dom(A)^A(x)A 0
(x)
A groundtype isapair of a groundontext A andagroundtype ,written A). The
ordering isextendedtogroundtypesby setting
(A))(A 0
) 0
) () (A
0
A)^( 0
)
Denition9 A ontextA isanite mapfrom-identierstotypes. Ifx62dom(A), then
A[x7!℄istheontextwhihextendsAbymappingxto. ifx2dom(A), thenAnxisthe
ontextwhihisundenedatxandwhihoinideswithAelsewhere. Atypeisapair ofa
ontextAandatype,writtenA). Groundsubstitutionsareextendedstraightforwardly
toontextsandtypes.
Denition10 Aonstraintisapair oftypes,written 0
. Agroundsubstitutionisa
solutionof it i()( 0
); wethen write ` 0
. When stands for , we say is
a k-solutionof 0
i()
k (
0
); wethen write`
k
0
. We write`C (resp.
`
k
C) when` (resp. `
k
)holdsfor all 2C.
Denition11 Typeshemesaredenedby
::=A) jC
where A denotes a ontext, a type and C a onstraint set. (The symbol j should be
interpretedhereasaliteral,notasahoie.) Letfv()standforthesetofalltypevariables
whih appear inA, orC. The orderof isjfv()j.
Intuitivelyspeaking,allvariablesofatypeshemearetobeonsideredasuniversallyquan-
tied. However,weshallnotwriteanyquantiersexpliitly. Formallyspeaking,noimpliit
-onversionis allowed ontypeshemes; -onversionshallbedealt withexpliitly. This
deisionallowsarigorousdesriptionofthewayfreshvariablesandrenamingsarehandled.
Wenowdenethedenotationofatypeshemeasasetofgroundtypes.
Denition12 The denotation JK of a type sheme is the union of the -upper ones
generatedby itsgroundinstanes. Thatis,
JA) jCK=fA 0
) 0
;9`C (A))A 0
) 0
g
Informally speaking, atype sheme is simply a way of desribing a set of ground types.
Thus, itsdenotation ispreisely this set, i.e. theset ofgroundtypeswhih theprogram
would reeive in a system without polymorphism. Sine subtyping allows weakening a
program'sgroundtype,itisnaturalforasheme'sdenotationtobeupwardlosed,hene
theuseofupperonesinits denition. Itisnowlearthatatypeshemeis moregeneral
thananotheroneiitrepresentsalargersetofgroundtypes;thus,subsumptionbetween
typeshemesisdenedasset-theoretiinlusionof theirdenotations,asfollows.
Denition13 Given two type shemes
1 and
2
, the former issaid to be moregeneral
than the latter i J
1 K J
2
K; we shall then write
1 4
2
. In other words,
1
is more
generalthan
2
ifor anygroundinstaneof
2
,thereexistsagroundinstaneof
1 whih
issmaller withrespetto. Formally,
(A
1 )
1 jC
1 )4(A
2 )
2 jC
2 )
8
2
`C
2 9
1
`C
1
1 (A
1 )
1 )
2 (A
2 )
2 )
Wewrite
1
2
when
1 4
2 and
2 4
1 .
Therelation4wasintroduedin[TS96℄,where itiswritten 8
.
2.4 Typing rules
The language we are interested in is ore ML, that is, a -alulus equipped with alet
onstrut. Forthesakeofsimpliity,weseparate-bound identiersfromlet-boundones,
byplaingthemin twodistintsyntatilasses.
Denition14 Assume given a denumerable set of let-identiers, denoted by X, Y,...
Expressionsaredenedby
e::=xjx:ejeejX jletX =ein e
Denition15 Environmentsaredenedby
::=?j ;X :
Environmentaess isdened, asusual,by
( ;X :)(X)= ( ;Y :)(X)= (X) whenX 6=Y
Note that environments ontaininformation aboutlet-bound variables only. Assoiating
typesto-bound variablesis doneinside typeshemes,asshownbythetypingrulesgiven
ingure1.
Denition16 An expression e is well typed in an environment i there exists a type
sheme ,whosedenotation isnon-empty,suhthat `e:.
ReallthatthedenotationofatypeshemeA) jC isnon-emptyifandonlyifCadmits
asolution. Thus, todeterminewhether aprogramiswell-typed,onemustnotonlybuilda
typingderivation, butalsomakesurethatityieldsasolvableonstraintset.
Also,reallthattherelation4,aswellasthenotionofdenotation,dependonourhoie
of. So,therearetwovariantsofthistypesystem,one basedonequality,theotherbased
onsubtyping.
Inthis system,onerule isdevoted toeahsyntationstrut;in addition,rule(Sub),
alledthesubsumption rule,allowsreformulatingthetypeshemeatanypoint,withgreat
exibility. It allows arbitrary -onversions, as well as simpliations of the onstraint
system.
These rules aim at simpliity. Still, we expet the unfamiliar reader to wonder why
ontexts are made part of type shemes. Let us explain. Contexts are part of the -
liftingmehanism,whihallowsus toemulate thebehaviorofML, whileusing universally
quantied variables exlusively. But how an we express \monomorphi" types, sine all
variablesmustbeuniversallyquantied? Hereisanexample. Considertheexpression
x:letY =xin f:(fYY)
LetustypethisexpressioninML.Y'stypeisamonomorphivariable. So,thetwouses
ofY donotinvolveanyinstantiation, andtheexpression'stypeis!(!!)!.
In oursystem, on the ontrary, Y's type is (x : ) ) , aordingto rule (Var). Here,
`x:A) jC
(Var)
`e:A[x7!℄) 0
jC
`x:e:A) ! 0
jC
(Abs)
`e
1
:A)
2
! jC `e
2
:A)
2 jC
`e
1 e
2
:A) jC
(App)
(X)=
`X:
(LetVar)
`e
1 :
1
;X:
1
`e
2 :
2
`letX =e
1 in e
2 :
2
(Let)
`e: 4 0
`e: 0
(Sub)
Figure1: Typingrules
is (impliitly) universally quantied. So, ifone were freeto use rule (Sub) to perform
renamings,thetwousesofY ouldyieldtwodistintshemes(x:)) and(x:)).
However,the typing rule forfuntion appliation requires that its twobranhesshare the
sameontext. So, neessarily, and must bethesamevariable, andthesub-expression
f:(fY Y)has type (x : )) ( ! ! ) ! . One the-abstration is performed,
the whole expression reeivestype ! ( ! ! ) ! , as expeted. Tosum up, all
variableswhihappearintheontextatuallyhavemonomorphibehavior;thisisausedby
asharingonstraintonontexts,whihisenforedwhenevertwobranhesofthederivation
are brought together. So, we are able to do away with the notion of unquantied type
variable;nonetheless,thesystemisorret,asstatedbelow.
Statement1 Let ebean expressionsatisfyingthe followingtwoonditions:
eah-identierisboundatmostonewithine;
if letX =e
1 in e
2
isasub-expressionofe,thenX appears freewithin e
2 .
Assume e to be well-typed in the empty environment. Then e is safe with respet to a
all-by-value semantisofthe language.
Theabovetwoonditionsaretehnial. Therstoneismadeneessarybythewaywe\lift"
-binders through let binders; the seond one is required to make rule (Let) safe with
respet to aall-by-valuesemantis[TS96℄. They arenotrestritive,sineanyexpression
anberewritten,withoutalteringitssemantis,soastosatisfythem. Indeed,tosatisfythe
rstondition,anappropriaterenamingof-boundvariables shalldo;tofulll theseond
one, it suÆes to replae the onstrut letX = e
1 in e
2
with letX = e
1
in ( :e
2 )X
wheneverX doesnotappearfreein e
2 .
[F℄ `
I
x:[F[f;g℄(x7!))jfg
(Var
i )
[F℄ `
I e:[F
0
℄A) 0
jC A(x)= 62F 0
[F℄ `
I
x:e:[F 0
[fg℄(Anx))jC[f! 0
g
(Abs
i )
[F℄ `
I e:[F
0
℄A) 0
jC x62dom(A) ;62F 0
[F℄ `
I
x:e:[F 0
[f;g℄A)jC[f! 0
g
(Abs'
i )
[F℄ `
I e
1 :[F
0
℄A
1 )
1 jC
1
[F 0
℄ `
I e
2 :[F
00
℄A
2 )
2 jC
2
[F 00
℄A
1
^A
2
=[F 000
℄AjC
m
;62F 0 00
C=C
1 [C
2 [C
m
[f;
1
2
!g
[F℄ `
I e
1 e
2 :[F
000
[f;g℄A)jC
(App
i )
(X)= renamingof rng()\F =?
[F℄ `
I
X :[F[rng()℄()
(LetVar
i )
[F℄ `
I e
1 :[F
0
℄
1 [F
0
℄ ;X:
1
`
I e
2 :[F
00
℄
2
[F℄ `
I
letX =e
1 in e
2 :[F
00
℄
2
(Let
i )
Figure2: Typeinferenerules
The readermaypoint outthat these onditionsarenotpreserved by redution,whih
posesaproblemwhenattemptingtoexpressasubjetredutionproperty. However,weshall
not attempt to prove statement1 in this paper, beause we hoose to fous on the issue
of onstraintsimpliation. Weremovethese onditions andgiveafull subjetredution
proof|fortheasewherestandsforthesubtypingrelation|in[Pot98b,Pot98℄. Doingso
requiresamoreomplexformulationofthetypesystem,whihiswhywehoosesimpliity
here.
Lastly, onemay notie that safety|with respet to any semantis|omesfor free in
thepure-alulus,sinetherearenopossibleexeutionerrors. However,thesafetyproof
givenin[Pot98b,Pot98℄isnotbasedonthisremark,andanbeextendedtomoreomplex
aluli.
2.5 Type inferene rules
The typing rules introdued above annot be diretly used to infer an expression's type.
First, they are not syntax direted, beause of rule (Sub). Seond, rule (App) plaes
sharing onstraints on its premises: A,
2
and C appear in both premises. So, we now
deneasetoftypeinferenerules,whihspeifyatypereonstrutionalgorithm;theyare
given in gure 2. The main dierene with the typing rules is the disappearane of the
subtyping rule, whih has been built into the appliation rule. (The \[F℄" annotations,
althoughnoisy,aretrivial;theyallowanexpliittreatmentof freshvariables.)
Rule (App
i
) uses the following denition, whih desribes how ontexts are brought
Denition17 The assertion [F℄ A
1
^A
2
= [F 0
℄ A j C stands, by denition, for the
followingonjuntion:
dom(A)=dom(A
1
)[dom(A
2 );
8x2dom(A
1
)\dom(A
2
) A(x)2VnF;
8x2dom(A
1
)ndom(A
2
) A(x)=A
1 (x);
8x2dom(A
2
)ndom(A
1
) A(x)=A
2 (x);
F 0
=F[fA(x);x2dom(A
1
)\dom(A
2 )g;
C=fA(x)A
i
(x); x2dom(A
1
)\dom(A
2
);i2f1;2gg.
Informallyspeaking,wesaythatAisthemeetofthetwoontextsA
1 andA
2
. Itisessentially
theleastdemandingontextwhihguaranteesthat bothA
1
'sandA
2
'sexpetationsabout
theexpression'sruntimeenvironmentarefullled.
Thetypeinferenerulesaresoundandompletewithrespetto thetypingrules|that
is,theyinferamostgeneraltypeshemefortheexpressionathand.
Statement2 The typeinferenerules areorretwith respettothe typing rules;that is,
[F℄ `
I e:[F
0
℄ implies `e:.
Statement3 The typeinferenerules are ompletewith respettothe typing rules. That
is, if `e: then, for any nite F V, there existsanite F 0
V anda typesheme
0
4 suh that [F℄ `
I e : [F
0
℄ 0
. Furthermore, 0
is uniquely determined, up to a
renaming,by ande.
These rules are verylose, in spirit, to those of Trifonovand Smith [TS96℄. However,we
have brought a few subtle, but important modiations, so as to produe type shemes
whihsatisfyaoupleofinterestingproperties. First,anysuhshemeismadeupofsmall
terms only. Seond,when standsforthesubtyping relationship,theshemeontainsno
bipolar variables. Bothpropertiesshallbeusedthroughoutthepapertosimplifystatements
andproofs. Weprovetheformerhere;thelatteris introduedin setion4.2.
Denition18 A small term is a type term of the form ?, > or
0
!
1
, i.e. a term
whose strit sub-terms are type variables. A typesheme A ) jC is made upof small
termsiitsatisesthe followingonditions:
for all x2dom(A), A(x) isatypevariable;
isatypevariable;
for all ( 0
) 2C, either and 0
are type variables, or one isa variable and the
other isasmallterm.
Theorem2 If [ ℄F`
I e:[F
0
℄,then ismade upofsmallterms.
Proof. Straightforwardindution onthestrutureofthetypeinferenederivation. 2
Thesmall termspropertyallowsreasoningaboutsharingbetweensub-terms, and isakey
requirementinourformulationofminimization(seesetion4.5). Itisalreadyto befound,
forinstane,inthetheoryofuniation[Hue76℄. Amongworksmoreloselyrelatedtoours,
AikenandWimmers[AW92℄andPalsberg[Pal95℄useasimilaronvention.
We are done introduing our type inferene system, whih speies how to assoiate a
onstrainedtypeshemewithagivenprogram. Weshallnowfousourattentionontothe
main issue ofinteresthere: how tosimplify aninferred typesheme, withoutaeting its
meaning. Webegin, in this setion, with thesimplerase where ishosento be=, i.e.
whereonstraintsareequations.
Inommonpresentationsof equality-basedtypesystems,noequationsappear;instead,
their most generaluniers are omputed diretly. Here, however, weexpliitly deal with
equalityonstraints,soastohighlightthesimilaritywiththemoreomplexaseofsubtyping
onstraints.
Inthissetion,andinthissetiononly,wehoosetodealwithsimpliedtypeshemes,
oftheform jC. Weshallnotonernourselveswithontexts,beausetheirpresenedoes
notaddanydiÆultytothesimpliationissue.
3.1 Preliminaries
Letusbeginwithafewstraightforwardfatsonerningtermautomata[KPS93℄.
Denition19 Atermautomatonisatuple A=(Q;q
0
;Æ;l)where:
Q isanitesetof states,
q
0
2Qisthe startstate,
Æ:Qf0;1g!Qisa(partial) transitionfuntion,
l:Q!
g
[V isa labelingfuntion,
suhthat forany stateq2Q andforany i2f0;1g,Æ(q;i)isdenedil(q)=!.
Astateq2Qissaid tobe freeiits label isavariable, i.e. l(q)2V. The orderof A
isthe numberof itsstates,i.e. jQj.
A termautomatonisessentiallyawayofrepresentingatypeterm,possiblyreursiveand
possiblywithfreetypevariables. Suharepresentationismoreompatthanalassitree
representation,beauseofitsabilitytoexpresssharingbetweennodes.
Denition20 Let A =(Q;q
0
;Æ;l) be aterm automaton. Extend Æ toa partial funtion
^
Æ: Qf0;1g
!Q. Then, A desribes afuntion
A
frompaths into
g
[V, dened by
p7!l(
^
Æ(q
0
;p)).
Rather thanviewing anautomatonasatypeterm, possiblyontainingtypevariables, we
analsohoosetoviewitasaset ofgroundtypes.
Denition21 Let A be a term automaton. The ground instane of A through a ground
substitution isthegroundtype denedasfollows: for allpaths p,
if
A (p)2
g
,then(p)=
A (p);
if
A
(p)isatype variable 2V,then
jp
=().
The denotationofaterm automatonA isthe setofits groundinstanes.
Statement4 Aterm automaton'sdenotationisnon-empty.
Statement5 Two term automata A and B have the same denotation i
A and
B are
equal uptoarenaming ofvariables.
=e =e
=e=e 0
(Fuse)
e=>=>
e=>
(Deompose
>
)
e=?=?
e=?
(Deompose
? )
e=
0
!
1
=
0
!
1
e=
0
!
1
0
=
0
1
=
1
(Deompose
! )
Figure3: Solvingmulti-equations
3.2 Simplifying multi-equations
Thetypeinferenealgorithmgeneratesequations. However,itisbestto introdueamore
general notion of multi-equation, as is often done in works on uniation [Hue76, JK90,
Rem92℄.
Denition22 A multi-equationisaset ofterms f
1
;:::;
n
g,written
1
==
n . An
equality onstraint
1
=
2
an be viewed as a multi-equation. The notion of solution is
extendedstraightforwardly tomulti-equationsandtosetsthereof. A multi-equationis made
upofsmalltermsiallof itsmembers arevariables or smallterms.
Inordertodeterminethataprogramiswell-typed,weneedto makesurethatitsasso-
iatedtypeshemehasanon-emptydenotation, i.e. that itsonstraintset hasasolution.
Thisisdonebyapplyingasetofrewritingrulestothemulti-equationset,asfollows.
Theorem3 Consider atype sheme =
0
j C, where C is a multi-equation set, made
up of small terms. Rewrite C aording to the rules of gure 3, until none applies; letC 0
denotetheresultofthisproess. Then,C 0
isalsomade upofsmallterms,andhasthesame
solutions asC. Furthermore,
if C 0
ontainsatleastonemulti-equationofthe forme= = 0
,whereneither nor
0
arevariables, thenJK isempty;
otherwise, C 0
is said to be in anonial form. It an easily be viewed as a term
automaton, whose order equals that of , and whose denotation oinides with JK.
As aorollary, JK isnon-empty.
Proof. With anappropriatedenition ofweight(e.g. giveweight1tovariables, ?and>,
andweight2tothe!symbol),itiseasytoverifythateahrewritingruleausesthetotal
weight of the multi-equation set to derease. Hene, the proess must terminate. Eah
rewritingruleobviouslypreservesthesolutionspae,aswellasthesmalltermsproperty.
Assume C 0
ontains amulti-equationwith twonon-variable terms. Then, these terms
musthaveinompatibleheadonstrutors,beausenoneofthedeompositionrules ing-
ure3applies. So,C 0
hasnosolution. On theotherhand,assumeC 0
is in anonialform;
rule (Fuse) no longer applies, eah variable appears in at most one multi-equation. In
eah multi-equation, hoose aunique representative, equalto its non-variable term when
ithas one,and toan arbitrarymemberotherwise. Foreah2fv (),letrepr() denote
therepresentativeof 'smulti-equation,if appears in somemulti-equation,and itself
otherwise. Dene atermautomatonA=(Q;q
0
;Æ;l)asfollows:
Q=fv();
q
0
=
0
;
fori2f0;1g,Æ(;i)=
i
whenrepr()=
0
!
1
;
l()=repr()().
Itisstraightforwardtoverifythatthegroundinstanesofthisautomatonareexatlythose
of; hene,itsdenotationoinides withJK. Thenon-emptiness resultstemsfromstate-
ment4. 2
Theorem 3yields an algorithm to determine whether atype sheme has anon-empty
denotation; this makes typeinferene deidable. However,it also showsthat aanonial
typeshemeanbeviewed asaterm automaton;wenow establishthe onverse, showing
thatthetwonotionsareequivalent.
Theorem4 LetAbeaterm automaton. Then,thereexistsaanonial typesheme ,of
the sameorder, whosedenotation oinides withA's.
Proof. Assume A = (Q;q
0
;Æ;l). Choose some injetivemap q 2 Q 7!
q
2 V. Dene a
multi-equationset C by
foreah2rng(l),f
q
;l(q)=g2C;
foreahq2Qsuhthatl(q)=?,f
q
;?g2C;
foreahq2Qsuhthatl(q)=>,f
q
;>g2C;
foreahq2Qsuhthatl(q)=!,f
q
;
Æ(q;0)
!
Æ(q;1) g2C.
Dene =
q0
jC. Itis straightforwardto verifythat JK oinideswith A'sdenotation.
2
Theequivalenebetweenanonialtypeshemesandtermautomatagivesrisetoanessen-
tialidea: the well-knownminimizationproedure for nite-state automata arriesoverto
anonialtypeshemes.
Theorem5 Letbeaanonialtypesheme. Amongtheanonialtypeshemesequivalent
to, thereis oneof minimal order, whih anbe omputedin timeO(nlogn), where nis
the orderof .
Proof. Thankstotheorems3and4,weanstatetheproblemintermsofautomata. Given
anautomatonA, of ordern, wemustomputean automatonB, whosedenotation equals
that of A, and whih is minimal for this property. Aording to statement 5, we an
equivalentlyrequire
A
=
B
. Hene,theproblemsimplyonsistsinminimizingthelabeled
nitestateautomatonA,whihan bedoneintimeO(nlogn)[Hop71℄. 2
0
where
0
=
1
=
2
!
3
5
=
6
=>
3
=
5
!
6
2
=
4
Figure4: A sampletypesheme,in anonialform
0
!
1
!
2
2
3
!
4
2
5
>
6
>
0
1
0
1
0 1
Figure5: Thesame,viewedasanautomaton
0
!
2
2
3
!
5
>
0 1
0 1
Figure 6: Theminimizedautomaton
0
where
0
=
2
!
3
5
=>
3
=
5
!
5
Figure7: Thesame,viewedagainasatypesheme
ameasureof itsomplexity|inquasi-lineartime. Figures4to 7illustrate thisproedure.
Our starting point is a type sheme whose multi-equation set has been put in anonial
formafter therulesofgure3. Theorem 3allowsusto viewitasanautomaton(gure 5),
whih we then minimize. Minimization is a well-known, two-step proess: rst eliminate
anystatesnotreahablefromthestartstate,thenmergeequivalentstates. Inbroadterms,
twostatesare equivalentif theirlabels areequaland iftheyarry transitions, with equal
labels, whose end statesare in turn equivalent. Thisproessyields theautomaton shown
in gure6. Finally, theorem4 allowsusto turn this automatonbakintoatype sheme.
Of ourse, thinking in terms of automata allows asimpleexplanation of the proess, but
isn't mandatory; theminimization proedure anbe desribeddiretly in terms of multi-
equations,ifonesowishes.
HowdoesthisproedureomparetotheusualresolutionproessusedinMLtypeinfer-
ene? An ML typehekeromputes themostgeneralsolution ofthe equation set, using
uniation. This essentially amounts to putting the type sheme in anonial form, by
applying the rules of gure 3, then mergingall membersof a single multi-equation. Our
algorithm goesonestep further,sinevariables belonging to dierent multi-equations an
also bemerged, provided theystand for equivalent statesof theautomaton. Infat, our
simpliation proedure is omplete|it yields a type sheme with a minimal number of
variables. Sineour shemesare made upof smallterms, this is ameaningfulmeasure of
theiromplexity.
Theoretially speaking, our deision of working with small terms allows us to easily
highlighttheisomorphismbetweentypeshemesandtermautomata. Moreintuitively,one
mightsaythatbreakingalargetypetermdownintoaseriesofsmallterms,linkedtogether
by equations, essentially amounts to labelling eah node of the original term with atype
variable. Identifyingvariablesisthentantamounttosharingnodesintheoriginaltypeterm,
thus yielding amore ompatrepresentation. Of ourse, auser is likelyto prefer a more
readablerepresentation,withfewervariablesandlargerterms;itiseasytoreverttosuha
representationfordisplaypurposes. (Forinstane,thetypeshemeofgure7anbeprinted
as
2
! > ! >.) This is already the ase in typial ML implementations, where types
areinternally representedbydireted ayligraphs,but printedastrees. It isimportant
to arefully distinguish thetwo representations, sinethe latter is typially exponentially
larger. In other words, aninternal representation must favor eÆieny; onverting to an
externalrepresentation, whih oersbetterreadability,must be delayeduntil theresultis
readyfortheusertobeseen.
Toonlude,wehavestudied aomplete simpliation proedure foronstrainedtype
shemes,intheasewhereonstraintsareequations. Itonsistsofthreemainsteps: putting
theonstraintsinanonialform,eliminatingunreahablevariables,andmergingequivalent
variables. Weshallnowmoveontotheaseofsubtyping,anddisoverthat,althoughdetails
beomemoreomplex,thesamebroadideasapply.
4 Simplifying subtyping onstraints
4.1 Solving onstraints
Asinsetion3,ourrsttaskistondanalgorithmtodeidewhetheragivenonstraintset
hasasolution. Indeed,doingsoisrequiredtodeterminewhether aprogramis well-typed.
Ourgoal,in thissetion,istodesribesuhanalgorithm.
Webegin with afundamental tehnial result, whih desribesa weak, suÆient on-
dition for aonstraint set to have a solution. It will form the basis for the proof of the
onstraint solving algorithm. We prove a fairly powerful version of this result, allowing
downtheseextendedonstraintswouldrequiresomeniterepresentation;however,wewill
notneed to do so.) Thanks to this generalization,this result also formsthe basis forthe
proofofthegarbageolletionalgorithm(seesetion4.3).
Denition23 A onstraintset withgroundonstantsisaset C of subtyping onstraints
ofthe form 0
,where and 0
are eithertwovariables,one variable andasmallterm,
orone variableandagroundtype. Dene theassertionC +1
0
tomean
8k0 8`
k
C `
k +1
0
Dene C
#
()=f; 62V^ 2CgandC
"
()=f; 62V^ 2Cg. C issaid
tobe weaklylosedithe followingonditionsaremet:
1. 2C and 2C imply 2C;
2. 2C and 2C
#
()imply 9 0
2C
#
() C
+1
0
;
3. 2C and 0
2C
"
()imply 9 2C
"
() C
+1
0
;
4. 2C
#
() and 0
2C
"
() imply C +1
0
.
Theorem6 LetC beaonstraintsetwithgroundonstants. IfC isweaklylosed,thenC
has asolution.
Proof. Notethat thisproofonlyusesonditions2and4of denition23. Theotherondi-
tionsshallberequiredbyfurther theorems,suhastheorem11.
LetV =fv(C). Considertheset T V
ofgroundsubstitutions ofdomainV. Wedenea
mapS from T V
intoitselfby
7!
7!
G
2C
#
() ()
AssumingT V
isviewedasametrispae,equippedwiththeusualdistanebetween(tuples
of)innitetrees[Cou83℄,itis easytoverifythatS is 1
2
-ontrative. Thus, ithasaunique
x-point.
Weshall nowverify that is asolution of C. This is doneby proving that it is a k-
solutionof C,for allk 0,by indution overk. The base aseis immediate, sine
0 is
uniformlytrue(seedenition2). Itremainstoprove,assuming`
k
C,that `
k +1 C.
Consider a onstraint of the form 2 C. Beause C satises ondition 2 of
denition23,wehave8 2C
#
() 9 0
2C
#
() C
+1
0
. Sine`
k
C,thisimplies
8 2 C
#
() 9 0
2 C
#
() ()
k +1 (
0
), whih in turn entails ( F
2C
#
()
())
k +1
( F
0
2C
#
() (
0
)). Thisstatementisnoneotherthan()
k +1 ().
Next,onsider aonstraintoftheform 2C,where 62V. Then, 2C
#
(). So,
bydenitionof,()(). Inpartiular,()
k +1 ().
Finally,onsider aonstraintoftheform 0
2C,where 0
62V. Then, 0
2C
"
().
Pik some 2 C
#
(). Then, ondition 4 of denition 23, together with our indution
hypothesis, yield`
k +1
0
,i.e. ()
k +1 (
0
). Sinethis holdsforall 2C
#
(), we
alsohave( F
2C
#
()
())
k +1 (
0
),i.e. ()
k +1 (
0
). Thisonludestheproof. 2
Theorem 6is anie toolto exhibit solutionsof aonstraint set. However,it is notlear,
givenanarbitraryonstraintset,howitanbeputinweaklylosedform.So,weshallnow
deneastronger,butsimpler,notionof losure,whihanbeomputedmoreeasily. This
isthenotionoriginallyproposedbyEifrig,SmithandTrifonov[EST95b℄.
bers arevariables or smalltermsdown intoasetof equivalentonstraints:
sub ()=fg sub( )=f g
sub(?)=? sub( >)=?
sub(
0
!
1
0
0
! 0
1 )=f
0
0
0
;
1
0
1 g
Denition25 LetC be aonstraint set,made upof smallterms. C issaid to be losed
i wheneverf ; 0
gC, sub(
0
) is dened and inluded in C. From now
on,atypesheme A) jC issaid tobelosediC islosed.
In plain words, the above denition means that a onstraint set is losed i it is stable
throughaombination oftransitivity andstrutural deomposition. Letus nowverify, as
announed,thatlosureentailsweaklosure;whihmeans,onsideringtheorem6,thatany
losedonstraintset admitsasolution.
Theorem7 AnylosedonstraintsetC is weaklylosed.
Proof. ItislearthatC satisesondition1ofdenition23.
Assume 2C. Let 2C
#
(). BeauseC islosed, sub( )=f gC.
So, 2C
#
(). Thisis suÆienttoestablishondition2of denition23; just pik 0
=.
Symmetrially,ondition3issatised.
Now,assume 2C
#
() and 0
2C
"
(). BeauseC islosed, sub(
0
) is dened
and part of C. Thus, any k-solution of C is, in partiular, ak-solution of sub ( 0
).
Moreover, onsidering the denition of sub, it is easy to verify that any k-solution of
sub(
0
)isa(k+1)-solutionof 0
. Condition4ofdenition23ensues. 2
Toonludethissetion,wepresentanalgorithmwhihputsagivenonstraintsetinlosed
form,ifithasasolution,and failsotherwise. Thisalgorithmisusedto determinewhether
agivenprogramiswell-typed. Itsbadomplexity: O(n 3
),aswellasthesizeofitsoutput:
O(n 2
),areamongthemainreasonswhyonstraintsimpliationisrequired.
Theorem8 LetC beaonstraintset,made upof smallterms. Let C 2
denote
C[
[
f;
0
gC
sub(
0
)
IfthesequeneC ;C 2
;C 4
;::: isinnite,thenitreahesax-pointC 1
,whihisthesmallest
losed onstraintset ontaining C;its solution spaeis equal toC's andnon-empty. (C 1
isalledthe losure ofC.) Otherwise, C has nosolution.
Proof. For anarbitrary C, it is learthat C 2
is equivalent to C if it is dened, and that
C has no solutionotherwise(i.e. if subis applied outsideof its domain). Thus, if some
elementofthesequeneisundened,thenChasnosolution. Otherwise,thesequenemust
reah a x-point C 1
, beause any newly reated onstraint involves existing terms, and
there isonlyanite numberof suh onstraints. It islearthatC 1
isthesmallestlosed
setontainingC. Aordingtotheorem7,C 1
isalsoweaklylosed;bytheorem6,itadmits
asolution. 2
Whilebuilding a typeinferene derivation, wewish to makesure, at everystep, that the
expressionathandiswell-typed,soastodeteterrorsassoonaspossible. So,wemustmain-
tainouronstraintsetsin losedform. Thismaybedoneinrementally,takingadvantage
ofthefat thateahtypeinfereneruleaddsafewfreshonstraintsto alosed onstraint
set; an inremental algorithm is desribed in [Pot98, Pot98b℄. Of ourse, ifweuse suh
an algorithm, then our simpliation algorithms must preserve the losure property; this
ensuresthatwemayperformsimpliationstransparentlyatanypoint.
If isthetypeshemeassoiatedtoanexpressione,itwouldbeinterestingtodistinguish
thetypevariablesof whihrepresentaninput(i.e. somedataexpetedbytheexpression
e)fromthosewhihrepresentanoutput(i.e. someresultsuppliedbye). Weshallannotate
eahtype variablewith a sign in theformer ase,and with a+signin thelatterase.
Of ourse, itis possiblefora variable toarry both signsat one; weallsuh avariable
bipolar. Somevariables,ontheotherhand,arrynosignatall;weallthoseneutral. Thus,
we shall assoiate a pair of Boolean ags, whih we all polarity, to eah variable. This
informationwillservetoguideallofoursimpliationalgorithms.
Denition26 Consider a weakly losed type sheme =(A)j C), made up of small
terms. Thesetof positivevariablesof,andthesetof negativevariablesof,respetively
denotedbyfv +
() andfv (), arethe smallest subsetsP andN of fv ()suhthat
2P
rng(A)N
82P fv +
(C
#
())P ^fv (C
#
())N
82N fv +
(C
"
())N^fv (C
"
())P
Polarities may beeasily omputed asa smallest x-point. The time required is linear in
thesize oftheonstraintset. Indeed, visitingavariable'sonstrutedlower(resp. upper)
boundshastobedoneatmostone,namelywhenthevariablerstbeomespositive(resp.
negative). Thus, eahonstraintistraversedatmostone;whenetheresult.
TrifonovandSmith[TS96℄introdued polaritiesasarenementofournotionof reah-
ability[Pot96℄,whihwouldonlydetetneutralvariables,and usedthemto drivegarbage
olletion (see setion 4.3). However, they did not mention ertain useful properties of
polarities,whihweshallnowdesribe.
Intuitivelyspeaking,eah positivevariableof representsapiee ofdataomputedby
e and aessible as a part of its result. Assume e is plaed inside a ontext C, yielding
an expression C[e℄ whose assoiated sheme is 0
. C[e℄'s result might still ontain some
parts ofe'sresult, meaningthat theorrespondingvariables arestill positivein 0
; others
mayhave been dropped, meaning that the orresponding variables are no longer positive
in 0
. However, any value omputed by e, but inaessible through its result, obviously
remainsinaessiblethroughC[e℄'sresult;whihmeansthatanyvariablesnot positivein
annotbeomepositivein 0
. Ananalogouspropertyholdsfornegativevariables. Inother
words, polaritiesderease asonewalksdownatypeinferene derivation. Thispropertyis
formalizedbythefollowingtheorem.
Theorem9 Consider an instane of one of the type inferene rules of gure 2, whose
outputisatypesheme . Pik some2fv(),andassumealsoappearsin 0
,where 0
is oneof the rule'spremises. Then, 2fv +
() (resp. fv ()) implies2fv +
( 0
) (resp.
fv ( 0
)).
Proof. Theonly non-trivialase is that of rule (App
i
). Weuse the notationsof gure2.
Fori2f1;2g,let
i
=(A
i )
i jC
i
);assumeC
i
islosed. Dene
P =fv +
(
1 )[fv
+
(
2 )[fg
N =fv (
1
)[fv (
2
)[fg[fv(A)
Wewishto showthat P andN areonservativeapproximationsofthepolaritiesin, i.e.
that they satisfythe reursiveequations ofdenition 26. However,reall that omputing
to C 1
, not to C itself; weneed some information about C 1
in order to provethat the
equationshold.
Lettheassertion +
standfortheonjuntionfv +
()P^fv ()N. (Theassertion
isdened symmetrially.) NotiethatC
1 [C
2
islosed,beausethesesetshavedisjoint
domains. Let usall\new"the onstraintsin C
m
[f;
1
2
!g, aswellasany
onstraintsarisingfromthesubsequentlosureomputation. Itisstraightforwardtoverify
that whenever asmall term appears on the left-hand (resp. right-hand) side of anew
onstraint,then +
(resp. )holds.
Thisguaranteesthattheequationsofdenition26,appliedtoA) jC 1
,aresatised
byP and N. Beause fv +
()and fv ()are thesmallestsolutionsof theseequations, we
havefv +
() P and fv ()N. Inpartiular, fv +
()\fv(
i )fv
+
(
i
)and fv ()\
fv(
i
)fv (
i
);whihisthedesiredresult. 2
Theorem9guaranteesthatavariable'spolaritydereasesduringitslifetime. Asaorollary,
ifthetypeinferenerulesarewrittensoastoneverauseafreshvariabletobebipolar|and
sotheyare|thennobipolarvariablesaneverappearin atypeinferenederivation.
Theorem10 Assume[F℄ `
I e:[F
0
℄. If none ofthe (X), for X 2dom( ), ontains
abipolar variable, thenneitherdoes .
Proof. First,wehekthatwheneverafreshvariableisreatedbyoneofthetypeinferene
rules,it isnotbipolar. Consider,forinstane, rule (Var
i
). It reatestwovariablesand
. The formerappears in the ontext of the typesheme, while the latterappears in its
body. Hene, is negative, and is positive. Aording to denition 26, polaritiesan
onlytravelfromavariabletoasmallterm,sotheonstraint doesnotause(resp.
) tobeomepositive(resp. negative). Note, onthe otherhand,that in the typesheme
(x7!)),is bipolar;splitting intotwovariablesand,linkedbyaonstraint,is
thetehnialtrikwhihallowsusnottoreateanybipolarvariables. Rule(App
i
)ontains
asimilar trik.
Seond, theorem 9tellsus that ifavariableis bipolar ata ertainpoint, thenit must
havebeensosinethemomentitwasreated. Aordingtothepreviousparagraph,thisis
impossible;whenetheresult. 2
Thisresultisusedtosimplifyvariousdenitionsandproofs,inpartiularonerninggarbage
olletionandanonization. Of ourse,wewill needto provethat oursimpliationalgo-
rithms also ause polarities to derease, so we an perform simpliations at any point
withoutbreakingthisproperty.
4.3 Garbage olletion
Computing the losure of a onstraint set typially yields alarge number of onstraints.
Many of them are useful as intermediate steps of the losure omputation, but are no
longeressentialoneitisover. Morepreisely,weshallnowshowthattheonlymeaningful
onstraintsin alosedshemeA)jC arethefollowing:
those whih link apositive(resp. negative) variableto anelementofC
#
()(resp.
C
"
())|they giveinformationaboutthestrutureof apiee ofdata supplied(resp.
expeted)bytheexpression;
thosewhih linkanegativevariabletoapositiveone|theyrepresentapossibleow
ofdatafrom oneoftheexpression'sinputstooneofitsoutputs.
weansimplyforgetaboutthem; this proess,proposedbyTrifonovand Smith[TS96℄, is
alledgarbage olletion. Note thatallneutralvariablesaredisarded;inouranalogywith
setion 3, garbage olletion orresponds to the removal of unreahable nodes in a nite
automaton. Itdoesmorethan that, however, sineit alsoremovesertain edgesbetween
reahablenodes.
Denition27 Consider asindenition 26. Theimageof through garbageolletion,
denoted by GC(), is the type sheme A ) j D, where D is a subset of C dened as
follows:
2D i 2C,2fv ()and2fv +
();
D
#
() equalsC
#
() if2fv +
(), and?otherwise;
D
"
() equalsC
"
() if2fv (), and?otherwise.
Theorem11 Consider asindenition 27. Then GC().
Proof. Write 0
=GC(). Sine 0
has feweronstraints,it is lear that 0
4. So, we
needtoprove 4 0
. Aordingto denition13,thisisequivalentto
8 0
`D 9`C (A)) 0
(A))
Pik some 0
` D. We now wish to prove that the following onstraint set with ground
onstants (seedenition 23)admitsasolution:
C[f 0
()g[f 0
(A(x))A(x); x2dom(A)g
We shall doso by proving that the following onstraintset|whih ontains theprevious
one,aordingtodenition 26|isweaklylosed:
C[f 0
();2fv ()^ 2C r
g
[f 0
();2fv +
()^2C r
g
(whereC r
denotesthereexivelosureofC,i.e. 2C r
i= or2C). Let
E denotethisset.
Beause C satisesondition1ofdenition 23,so doesE. Usingthesameproperty,it
is easy to hekthat E satises onditions 2and 3. There remainsto hek ondition4.
Assume 2E
#
() and 0
2E
"
(). Fourasesarise, depending onwhether and 0
are
smalltermsorgroundterms:
Both and 0
are small terms. Then, 2 C
#
() and 0
2 C
"
(). The result is
immediate, onsideringC meetsondition4.
Both and 0
aregroundterms. Then,aordingtothedenitionofE, isequalto
0
(), for some 2fv ()suh that 2C r
. Symmetrially, 0
is of the form
0
( 0
),forsome 0
2fv +
() suh that 0
2C r
. Beause C satisesondition1
of denition23, 0
2C r
. If = 0
,then = 0
andtheresultisimmediate. So,
weanassume 0
2C. Sine 2fv ()and 0
2fv +
(),denition 27speies
that 0
2D. Sine 0
`D, 0
() 0
( 0
);that is, 0
holds.
isasmalltermand 0
isagroundterm. Asbefore, 0
isoftheform 0
( 0
),forsome
0
2fv +
()suhthat 0
2C r
. Ontheother hand,wemusthave 2C
#
(). If
0
2C,onsideringthatCsatisesondition2ofdenition23,thereexistsasmall
term 2C ()suhthat C . If, onthe otherhand, = , thenthe
sameholds(simplypik 0 0
=). Piksome`
k
E. Wethenhave()
k +1 (
0 0
).
Furthermore, beause 0
2 fv +
(), denition 27 speies that 0 0
2 D
#
( 0
). Sine
0
`D,thisentails 0
( 0 0
) 0
( 0
). Now,weneedtoreasonbyasesonthestruture
of 0 0
:
{ Assume 0 0
isoftheformÆ
0
!Æ
1
. Sine 0
2fv +
(),denition26speies that
Æ
1 2fv
+
()andÆ
0
2fv (). Aordingto thedenitionof E,Æ
1
0
(Æ
1 )2E.
Sine`
k
E,thisimplies(Æ
1 )
k
0
(Æ
1
). Symmetrially, 0
(Æ
0 )
k (Æ
0 ). Asa
onsequene,(Æ
0
!Æ
1 )
k +1
0
(Æ
0
!Æ
1
). Inotherwords,( 0 0
)
k +1
0
( 0 0
).
{ Assume 0 0
isequalto?or>. Then, thesameholds,i.e. ( 0 0
)
k +1
0
( 0 0
).
Weannowombine,bytransitivity,thethree resultsobtainedabove:
()
k +1 (
0 0
)
k +1
0
( 00
) 0
( 0
)
Thisimplies()
k +1
0
( 0
). Thatis,`
k +1
0
,whih isthedesiredresult.
Thelastaseissymmetrialtothepreviousone. 2
Itis easy to hek that garbageolletion preservespolarities. Furthermore,provided
bipolarvariablesaredisallowed,itsoutputislosed,asstatedbelow. Thisimportantremark
wasmissingfrom[TS96℄.
Theorem12 Consider asindenition27. Iffv +
()\fv ()=?,thenGC()islosed.
Proof. Write GC() = A ) j D, as in denition 27. As per denition 25, assume
f ;
0
g D. Then, 2 fv +
(), beause itappears on theright-hand side ofa
onstraint in D. Symmetrially, 2 fv (). This is impossible, by hypothesis, so D is
(vauously)losed. 2
4.4 Canonization
Insetion3,inordertoviewamulti-equationsystemasanitestateautomaton,werequired
ittobeinanonialform,i.e. toequateeah variablewithatmostonenon-variableterm.
Similarly, in the ase of subtyping, we say that a onstraint set is in anonial form i
eah variable has exatly one non-variable lower (resp. upper) bound. We shall require
this property before we attempt to minimize onstraint sets. In this setion, we give an
algorithm,alledanonization,whihomputesaanonialform ofanarbitraryonstraint
set.
Denition28 Let =A )j C be a type sheme, made up of small terms, ontaining
nobipolarvariables, suhthat =GC().
Let V (resp. W) range over non-empty subsets of fv () (resp. fv +
()). For eah
suh V (resp. W) of ardinality greaterthan 1, pik afresh variable
V
(resp.
W ). (By
fresh variables, we mean that these variables are pairwise distint, and distint from 's
variables.) Dene the rewritingfuntion r (resp. r +
) aordingtogure 8. Therst two
linesdener (resp. r +
) onnon-emptysetsof negative(resp. positive)variables;theyare
thenextendedtosetsof negative(resp. positive)smallterms.
The image of through anonization, denoted by Can(), is A ) j D, where the
onstraintsetD isgiven by gure9. Itislearthat Can()isinanonial form.
r (fg)= r (fg)=
r +
(W)=
W
whenjWj>1 r (V)=
V
whenjVj>1
r +
(f?g[S)=r +
(S) r (f>g[S)=r (S)
r +
(f>g[S)=> r (f?g[S)=?
r +
(?)=? r (?)=>
r +
(f
1
!
1
;::: ;
n
!
n
g)=r (f
1
;:::;
n g)!r
+
(f
1
;:::;
n g)
r (f
1
!
1
;::: ;
n
!
n g)=r
+
(f
1
;:::;
n
g)!r (f
1
;:::;
n g)
Figure8: Denitionoftherewritingfuntions
r (V)r +
(W)2D i92V 92W 2C
D
#
()=fr +
(C
#
())g D
"
()=fr (C
"
())g
D
#
(
V
)=f?g D
"
(
W
)=f>g
D
#
(
W )=fr
+
( [
2W C
#
())g D
"
(
V
)=fr ( [
2V C
"
())g
Figure9: Canonization
upper bounds and greatestlowerbounds of existing variables. Forinstane, \t"may
be represented by a fresh variable
f;g
, together with the onstraints
f;g and
f;g
. Astraightforwarddenition ofanonization,based onthis priniple,is given
by Trifonov and Smith [TS96℄. However, it involves intermediate losure omputations,
whih generate many superuous onstraints. Forinstane, and abovemust be pos-
itive, beause the least upper bound expressions whih arise during anonization always
involvepositivevariables. Sinetherearenobipolarvariables,and annotbenegative.
So,thefresh onstraints
f;g
and
f;g
shall beremovedbythenextgarbage
olletionpass. Inbetween,though, theseonstraintswilltakepartinalosure omputa-
tion,and theirtransitiveonsequenesmaysurvivegarbage olletion. Rather thangoing
throughtheproessofaddingsuperuousonstraintsaspartofanonization,performinga
losureomputation,andtheneliminatingthem,wegiveamoredetaileddesriptionofan-
onization,whoseoutputisprovablylosed,andwhihdoesnotgeneratetheseunneessary
onstraints,thussavingtime.
Forthesakeofsimpliity,ourdenitionreatesanexponentialnumberoffreshvariables.
Of ourse,an implementation shallreate afresh
V or
W
only ondemand,i.e. when it
appearsintheonstrutedboundofanexistingvariable|whihmaybeanoriginalvariable
,ormayitself bea ora.
Consideringourstronghypotheseson,itiseasytoprovethatCan()islosed. Further-
more,wemayprovethatexisting variablessee theirpolaritydereaseduring anonization.
Theseresultsmeanthat wemayapplyanonizationtransparentlyat anypointofthetype
inferene proess, whilestill performing inremental losure omputations, and relyingon
theassumptionthatnobipolarvariablesexist. Theyareprovedbelow.
Theorem13 Consider asindenition 28. Then,Can() islosed. Furthermore,
fv +
(Can())f
W g[fv
+
()
fv (Can())f
V
g[fv ()
Asaorollary, thereare nobipolar variables in Can().
Proof. Werstverifythat twoonstraintsofD involvingvariablesanneverbeombined
bytransitivity. ItsuÆes to notiethat r (V)anneverbeequalto r +
(W),beausethe
formeris ofthe form
V
or2fv (),while thelatteris of theform
W
or2fv +
().
Sine hasnobipolar variables,fv +
()\fv ()=?.
Tofully verify the requirement of denition 25, it essentially suÆes to further notie
that = GC(). This implies that for all 2 fv +
() (resp. 2 fv ()), C
"
() (resp.
C
#
())isempty; whih impliesD
"
()=f>g(resp. D
#
()=f?g). Thedesiredproperty
followseasily;thus,Can()islosed.
This resultallowsusto omputepolarities. Weverify thatf
W g[fv
+
()andf
V g[
fv ()satisfythex-pointequationsofdenition26,appliedtoCan(). Todoso,itsuÆes
to notie that a
W
never appears in negative(resp. positive) position in a non-variable
lower(resp. upper)bound|asymmetriresultholdsof
V
|andthatany2fv()appears
infewerpositionsthanin C. 2
Wearenowreadyto provetheorretnessoftheanonizationalgorithm.
Theorem14 Consider asindenition 28. Then Can().
Proof. Letususethenotationsofdenition28. WerstshowthatCan()4, i.e.
8`C 9 0
`D 0
(A))(A))