Didier Remy
INRIA-Roquenourt
June 12, 2001
Abstrat
Wedesribeawayofrepresentingpolymorphiextensi-
blereords in statiallytyped programminglanguages
thatoptimizesmemoryalloation,aessandreation,
ratherthanpolymorphiextension.
Introdution
Typesystemsforreordshavebeenstudiedextensively
in reent years. Newoperationson reordshavebeen
proposed suh aspolymorphiextension that builds a
reord from an older one without knowing its elds.
Suhoperationsareverypowerful,andwerenotalways
providedasprimitiveonstrutsinuntypedlanguages.
Inomparison to thenumerousresultsonthetype
theory of reords, there hasbeenless interestin their
ompilation. Many languagesstill have monomorphi
operations on reords, e.g. most implementations of
ML [HMT90, Wei89, Ler90℄. Others, that have more
powerfulreords,use assoiationlist tehniques,even-
tuallyimprovedbyahing.
Safe untyped languages require that the presene
of elds is heked before aess. The use of assoia-
tion lists interleavesdynami heking with aess to
elds. In strongly typed languages, the presene of
elds is statiallyheked. Thus therepresentationof
reords by assoiation lists performs superuous run-
timeheks,anditseemsthat heapersolutionsould
befound.
Wepropose arepresentationof reordsbasedon a
simple perfet-hash oding of elds that allowsaess
in onstanttimewith onlyafew mahine instrutions
whihanbedroppedtoasingleinstrutionwhenever
theset ofelds ofthereordisstatiallyknown. Cre-
ationanbeperformedintimeproportionaltothesize
ofthereord,andalloatesavetorofsizethenumber
Author's address: INRIA, B.P. 105, F-78153 Le Chesnay
Cedex.Email:Didier.Remyinria.fr
of elds plus one. Only the polymorphi reation of
reordshastopaymoreintimeandmemory.
Insetion1,thespeiityofreordsasdataprod-
ut strutures servesas an introdution to the ondi-
tionsforwhihourrepresentationwillworkinpratie.
The method is desribed in detail in setion 2as the
enodingofpartialfuntionsfromlabelstovalueswith
nite domains. In setion3 weextend themethod to
reordswith defaults. These are total funtions from
labels to values that are onstant almost everywhere.
As an appliation we get safe standard reords in an
untypedlanguage. Insetion4,wedisusshowtohan-
dlepathologialasesinordertopreventbadbehavior.
1 Reords and their speiity
Reordsareprodutdatastrutures. Eahpieeof in-
formationisstoredwithakey,moreommonlyalleda
label,that isusedtoretrievetheinformation. Thereis
atmostone valueassoiatedtoalabel. Byeldapair
noteda7!vofalabelaandavaluev. Suhdatastru-
turesareofommonuseinomputersiene. However
ourdenitionofreordsistoovagueforhoosingagood
representationof reords. It is neessaryto knowthe
average and the maximum size of these strutures, to
knowwhether theyare reatedinrementally, the fre-
quenyof the dierent operationsand whih ones are
privileged. It is obvious that dierent programs will
givedierentanswers. Thesequestionsanonlybean-
swered in general, or the answers that we give below
analso betakenasassumptions forwhihour repre-
sentationofreordswillworkin pratie.
Reords are provided with three operations. Cre-
ationbuildsareordwithanitenumberoflabelsto-
getherwithvaluesassoiatedtotheselabels. Extension
takesareordand buildsanewonethathas allelds
oftherstone plusanite numberofnewelds. A-
ess takes areord and a label and returns the value
assoiatedtothatlabel. Itfailsifthelabelisnotinthe
domainofthereord.
Reordshavearelativelysmallnumberoflabels. At
mostaoupleofhundred,onaveragelessthanten. Non
inrementalreationandaessarebothveryfrequent
and are privileged. There are usually many reords
with the same domain. Spae and time are equally
The simplest representation of reords by assoia-
tionlistsisgoodforverysmallreordsbutitmakesa-
esstolargereordstooslow. Balanetreeswouldhave
betterperformanesforlargereordsbuttheoverhead
hasalsoto bepaid forsmallreords;theyalsorequire
toomuhmemory. Generalhashedtableswillalsohave
anoverhead thatisnotaeptableforsmall reords.
2 Extensible reords with poly-
morphi aess
Inthissetionweonsiderreordsaspartialfuntions
fromlabelstovalues,withnitedomains. Theproblem
istondarepresentationforsuhfuntions,suhthat
under theassumptions of theabovesetion, the oper-
ationsonreordsanbeperformedeÆiently. Byper-
forming,wemeanevaluatingin general. Inpartiular,
thisappliestoompilationwheresomeoftheevaluation
anbedonestatially.
Standardmonomorphi operationsonnon extensi-
ble reords should not be penalized by the introdu-
tion of more powerful reords. Although it is always
possible to keep two kinds of reords oexisting, the
new reordsshould replaethe older, weakerones. It
should be left to the ompiler to reognize that some
reordoperationsaremonomorphi and thus anbet-
terbeompiled. Indeed,thiswillnotbepossibleforall
ompilationshema.
Areord r isa partial funtion from labels to val-
ueswithanite domain(a
i 7!v
i )
i21::n
. Thesimplest
deompositionof thisfuntionis
Labels h
*[1;n℄
v
!Values
a
i
>i >v
i
The total funtion v stores the omponents of the
reord, while the the partial funtion h, alled the
header, desribes how labels are mapped to indies.
Thisdeompositionsuggeststherepresentation ofrin
avetorR :
07!H
i7!v
i
i2[1;n℄
whereH representsthefuntion h.
The partialfuntion h needsto be dened atleast
onthedomainofranditshouldbetterbeinjetiveon
the domain of r too. Any suh funtion would work,
sinevisthendened by
(hjdom(r)) 1
Ær
uptopermutationofindexeswithidentialvalues.
Ifallreordsareodedsuhthattheirheaders(the
representationsmaydierprovidedtheyimplementthe
same funtion) only depend on their domains, then
ompilationofoperationsonreordswhosesetofelds
atingtheirheader. Theaessbeomeasingleindiret
readtofeththevalueofthereordonthateld. The
reationisalwaysin that ase,sineitbuilds areord
with n elds from nothing. The header an be om-
puted statially and shared between all reords built
by the samefuntion. The ost is reduedto alloat-
ingandllingn+1elds ofavetor. Moregenerally,
headersanbesharedbetweenallreordsthathavethe
samedomainbykeepingallexistingheadersin atable.
Polymorphi aess and polymorphi extension
must use theheaders. Forsake of simpliity, we on-
siderthatlabelsareintegers. Theparserandtheprinter
woulddealwiththeisomorphismbetweenintegersand
namesin areallanguage.
Finding a good representation of h seems as diÆ-
ult asnding agood representation of r. There are
twodierenes,though. Sinetheheaderisshared,we
are allowed a little more exibility on the size of H.
Also,hisafuntiononintegers,thusweareallowedto
usearithmeti and logi operation on integers. There
is no hope of nding a diret representation of h by
arithmeti operations, sine its domain is ompletely
arbitrary. Atleastsomemappingbetweenintegershas
tobeanarbitrarymaprepresentedbyavetorofinte-
gersforinstane. Amixeddeompositionoftheheader
his:
IN
( modp)
![0;p 1℄
*[1;n℄
where( modp) is injetiveonthe domain of r. Suh
a deomposition is always possible, to the prie of a
higher p. In pratie the smallest p that works is on
averagetwiethesizeofnforafewlabelsandthreeto
fourtimesfor largersets oflabels. Sinethe headeris
shared,thisisveryaeptable.
Theinterestofthisdeompositionofhisthatitan
beompiledeÆientlyandodedinavetorH:
07!p
j7!(j 1) j2[1;p℄
The partial funtion must be extended into a total
funtionon[1;p℄. Wewrite ittheunonstrainedex-
tensionof.
In too ases below, We will also be interested in
two partiular extensions of below that we write ^
and . The former ^is anextension of with values
of[1;n℄, thus it makesr atotal funtion. Thelater
extendsoutsideof[1;n℄,forinstane0,whihprovides
amembership testto thedomain of bytesting for
equalityto zero.
ThedomainofrmustalsobeodedinH forpoly-
morphi extension. All its labelsan be listed at the
endofH.
8
>
<
>
: 07!p
17!n
2+j 7!(j) j2[0;p 1℄
1+p+i7!a i2[1;n℄
thereord,andonsequentlytheheader,arestatially
known. Suhinformationisexpetedtobefoundbythe
typeheker. Thisannotalwaysbethease,however.
Inorder torely onthetypes toknowthedomains
ofreords,theattendanesofelds, givenbythetypes
of reords, must orrespond exatlyto their domains.
This impliesthat the restritionof areordon aeld
modies its header, sine its hanges the attendane.
Thisis onepossiblesemantisfor restrition. Another
oneistotaketherestritionofeldsasaretypingfun-
tion, that is, afuntion that evaluatesastheidentity.
The hoie is between an expensive ative restrition
thatallowsaessoptimizationsoraheapretypingre-
stritionsthatforbidsthem.
3 Reords with defaults
Inthissetionweonsiderreordsas partialfuntions
fromlabelstovalues,onstantalmosteverywhere. The
problem is now to reognize whenever the eld does
not belong to the domain of the reord(wemean the
expliitly dened values, here), in whih ase the de-
faultvalueisreturned. Themembershiptestmightbe
expensivein timeorin spae.
There isa heap solutionbased onthe sameteh-
niqueasabove. Infat,weodedreordsbytotalfun-
tiononlabels,and desribedthedomain separatelyin
order to implement the extension of elds. Thus we
ould apply them outside of their domain (but get a
valueofunpreditabletype).
Letr be areord. Considerthereord r 0
equalto
id j
dom(r). An arbitrary label ais in the domainof
rifandonlyifitis equaltor 0
:a. Theauxiliaryreord
r 0
onlydependsonthedomainofrisalreadybeoded
in theheaderH asthedomain ofr. Rememberthati
isthe indexin R where thevalueoflabela
i
isstored.
Thusit istheindex wherea
i
isin R 0
, whihis alsoin
H atposition2+p+i, providedi=(a
i
modp).
WesimplyshifttheindiesinRtoplaethedefault
valueatposition1:
8
<
: 07!H
17!d
1+i7!v
i
i2[1;n℄
Weomputethe appliation ofr to thelabel aasfol-
lows. First ompute the index i assoiated to a, that
isH:(amod(H:0)). IfH:(2+p+i)isequaltoa, then
the label belongs to the domain ofr and the resultis
R :(1+i)otherwiseitisthedefault R :(1).
One must be areful to use the ^extension of .
The extension is still possible, but the above mem-
bershiptestmustbepreededbyamembershiptestto
thedomainof,andinaseoffailurethedefaultvalue
shouldalsobeused.
In fat, the enoding of reords with defaults an
n 3 5 8 11 23 30 40 60 100 200
p
ave
4 9 18 30 86 132 207 400 902 2565
p
max
11 18 29 48 148 206 298 576 1195 3053
Figure1: Averageheadersize
an untyped language: a dynami type error is raised
whenthemembershiptestfailsinsteadofreturningthe
default.
4 About eÆieny
Thissetiondoesnotonsidertheaseofreordswith
defaults for sake of simpliity, but it an be adapted
veryeasilyto them.
The previous representation of reords is very at-
trative sine it implements linear time aess, uses
very little memory, keeps the same performane on
monomorphireords asif all reordswere monomor-
phi. However we must hek the following points.
First,thatthesizeoftheheaderdoesnotgettoolarge.
Seondly,thatthepolymorphiextensiondoesnothave
toobad performane, eventhoughit isnotprivileged.
Last,that pathologialasesanbehandled.
Theomputationof theintegerpisat theheartof
everyquestion. Theproblem isgivenaset of integers
D,ndasmallintegerpsuhthat( modn)isinjetive
onD. Theintegerpdoesnotneedtobeminimal,even
ifomputedatompiletime,sinealargerheadermight
make polymorphi extension more eÆient. However,
the minimal p gives alower bound on the size of the
header. ForinstaneifD israndomly hosen,andthe
setoflabelsislargeenoughinomparisontothesizen
ofD,theprobabilitythatpdisambiguatesnintegersis
p!
(p b)!p n
Theformulathat givestheaverage smallestpin fun-
tionof nis simple, but gure1givesan experimental
resultontheaveragep
ave
andthelargestp
max foron
ahundredrunsperolumn.
Forsmallreords,10 4
runsdidnotgiveverydier-
entmaximalp. Thedispersionofpisshownbygure2
Theguresshowthatunder30labels,thesizeofD
doesnotexeeds,inaverage,four timesthenumberof
elds, and exeeds veryrarely twie the average. For
largereords(above50elds)theheaderbeomesvery
large. Itislearthatanothersolutionmustbeapplied
forlargereords. Evenifonewishedto pushthislimit
to a hundred labels, there is always a rank that an
be reahed in pratie (even if it is pathologial) for
anotherrepresentationshouldbeused.
10 20 30 40 50
200
50 100 400
Figure2: Dispersionoftheheadersize
Sine, we annot avoid a mixed representation, if
we want to handle large size reords in order to rep-
resent them with reasonablesize headers, weuse tags
todistinguishbetweenthetworepresentations. Forin-
stane, negative integers an be used as tags for an
alternaterepresentations. Therearemanypossibilities
for the alternate representations, and sine these are
pathologialases, in thesense that theydo notmeet
therequirementthatwesetinsetion1,wedonotare
muh about theeÆieny of thealternate representa-
tion. We propose, twopossible solutionsthat t well
withtheregularrepresentation.
Therstsolutioniswheneveraandbareequaltoj
moduloptoassign(j)withaninteger qsuhthatq
isin[2;p℄andmoduloqdistinguishesaandbandwith
valuesthat arenotin theimageof.
The seond solution replaes perfet hashing by
hashingwithlinearprobing([Sed88℄,Chapter16). The
headerisH is
8
>
>
>
<
>
>
>
: 07!p
17!n
2+j7!(j) j2[0;p 1℄
1+p+i7!a
i
i2[1;n℄
2+n+p+j7!a
(j)
j2[0;p 1℄
forlabelsthatdonotonit. When2 + n + p + (a
i mod p)
isalreadyoupied,thelabela
i
isplaedatthesmallest
freeposition after,sayj,and(j)islledwithi.
The rst method still gives aess in linear time,
butheadersaremorediÆulttoompute. Themaybe
muhfaster onaverage,but requirelargerheaders. In
bothasesthereis alot offreedomonhowtohose p,
aordingtohowmanyonits areaepted. Letting
p beabout3 times the size of r may avoid searhing
for optima while limitingthe number of lashes. The
rst method is more exible, sine a onit for one
labeldoesnotneedtodouble thesize oftheheaderor
reompute anotherheader: itsimply usestheholesof
the atualheader. Then, it an also be used even for
averagesize reordsin someases in order to ompile
moreeÆientextension.
The eÆieny of polymorphi extension has not
been onsidered yet, sine it was nota privileged op-
tionaltothesizeofthereorditextends. Thetimefor
omputingtheheader now beomes important. There
aredierentases(weexludelarge size reords, that
shouldberepresentedotherwise):
Thereisnoneedtoomputeanewp,
Theaverageaseforomputinganewp,
Theworseasesforomputinganewp.
In the average ase, omputing a new header means
that3dierentp'smustbeprobedperextension,sine
headersarein average3timesthesize ofn. Theopti-
mistiunitostU istheoneofaloopthatontainsat
leastonevetorwriteandonemoduloinstrution. The
ostfor aprobeis pU,sine thefailure is probableto
happen attheend. Thustheaverageostfor reating
anew header is 3pU plustwo other pU for llingthe
header.
However,areordwithaveryompatheadermay
beextendedwithalabelthatwillmaketheheaderget
loserto itsaveragesize. Then aboutnU probesmay
beneeded,makingtheostforthenewheaderinrease
topnU.
Ontheotherhand,itisveryprobablethat theex-
tended header alreadyexists oris trivial. If the label
a of theextended eld is takenat random, there is a
probabilityof(p n)=pthatthen+1eldswillstillbe
disambiguatedbyp. Inthatase,theostforreating
anew header is thesame asopying theold one plus
modifyingafeweldspU.
Thereationofanewheadermaybeavoidedmost
ofthetimesineitisveryprobablethatithasalready
bereated. Thisrequires that all existingheaders are
storedin aditionary. This willsaveboth spaeand
time. The keys in the ditionary are the domains of
headersthat anbekeptorderedin H, so that equal-
ity tests are not too expensive, and the whole ost of
searhingwouldbelowerthantheminimal ostof ex-
tension.
Ative restrition of elds an be implementing
alongthe sameideas. The header is lookedup in the
table. If it does not exist, then the new header need
notbethe smallestone,provided that it is ofreason-
ablesize. Thisavoidstheexpensiveostofndingthe
smallestinteger modulo whih all elements of the do-
mainasdistinguished.
5 Other ompilation shemas
Therearethreedierentideasintheaboverepresenta-
tionofreords
1. Thevalue of elds and the position of elds are
representedseparately. Theheaderthatdesribes
thelaterissharedbetweenallreordshavingthe
mainalwayshavethesameeldsatthesamepo-
sition.
2. The header anberepresentedby amodulo fol-
lowedbyaprojetion.
3. Dierent representation of the headers an live
together.
Therstpointisruialinourrepresentation. Anyrep-
resentation of reordsthat does notoriginally respet
thispointanstillbeusedtoimplementheaders,then
values an be stored in vetors as above. Sharing of
headerswillsavethelargeamountofspaerequiredby
assoiationlistsorbalaned trees.
The representation of the header itself is not im-
portant. Wedesribedonepossibilitythatisveryon-
venient forsmall andmedium size reords. But many
otherrepresentationsarepossible. Wehoosetorepre-
sentthe headerasastruturethat isinterpretedboth
byextension andaess. It ouldalso bealosure for
theaess part,together with adesriptionofthe do-
main that isneeded fortheextension. This isthe tag
vslosureduality.
If the extension and the restrition of elds are
themselveslosedwiththeheader,reordsouldreally
beviewedasobjetswith twomethodsfor aessand
extension.
Conlusion
Wehave presentedawayof representingreordswith
orwithoutdefaultsthatallowseÆientaessandre-
ation. Only the more powerful features suh aspoly-
morphi extension, ortrue restritionofelds haveto
payahigherprie.
Our representation of reord with defaults an be
usedtoimplementsafeaess inanuntypedlanguage.
An orthogonalappliationouldbetherepresenta-
tionoffeaturetermsthatareveryrelatedtoreords.
Thanks
TheseideasoriginateindisussionswithXavierLeroy.
Theyhavebeenmentionedforthersttimein[Ler90℄
andtestedintheuntypedversionofZin,theanestor
ofCaml-Light.
Referenes
[HMT90℄ Robert Harper, Robin Milner, and Mads
Tofte. The denition of Standard ML. The
MITPress,1990.
[Ler90℄ XavierLeroy.TheZINCexperiment: aneo-
BP105,F-78153LeChesnayCedex,1990.
[Sed88℄ Robert Sedgewik. Algorithms. Computer
Siene.Addison-Wesley, seondeditionedi-
tion,1988.
[Wei89℄ PierreWeis. The CAML Referene Manual.
INRIA-Roquenourt, BP105, F-78153 Le
ChesnayCedex,1989.