• Aucun résultat trouvé

Safe untyped languages require that the presene of elds is heked before aess

N/A
N/A
Protected

Academic year: 2022

Partager "Safe untyped languages require that the presene of elds is heked before aess"

Copied!
5
0
0

Texte intégral

(1)

Didier Remy

INRIA-Roquenourt

June 12, 2001

Abstrat

Wedesribeawayofrepresentingpolymorphiextensi-

blereords in statiallytyped programminglanguages

thatoptimizesmemoryalloation,aessandreation,

ratherthanpolymorphiextension.

Introdution

Typesystemsforreordshavebeenstudiedextensively

in reent years. Newoperationson reordshavebeen

proposed suh aspolymorphiextension that builds a

reord from an older one without knowing its elds.

Suhoperationsareverypowerful,andwerenotalways

providedasprimitiveonstrutsinuntypedlanguages.

Inomparison to thenumerousresultsonthetype

theory of reords, there hasbeenless interestin their

ompilation. Many languagesstill have monomorphi

operations on reords, e.g. most implementations of

ML [HMT90, Wei89, Ler90℄. Others, that have more

powerfulreords,use assoiationlist tehniques,even-

tuallyimprovedbyahing.

Safe untyped languages require that the presene

of elds is heked before aess. The use of assoia-

tion lists interleavesdynami heking with aess to

elds. In strongly typed languages, the presene of

elds is statiallyheked. Thus therepresentationof

reords by assoiation lists performs superuous run-

timeheks,anditseemsthat heapersolutionsould

befound.

Wepropose arepresentationof reordsbasedon a

simple perfet-hash oding of elds that allowsaess

in onstanttimewith onlyafew mahine instrutions

whihanbedroppedtoasingleinstrutionwhenever

theset ofelds ofthereordisstatiallyknown. Cre-

ationanbeperformedintimeproportionaltothesize

ofthereord,andalloatesavetorofsizethenumber

Author's address: INRIA, B.P. 105, F-78153 Le Chesnay

Cedex.Email:Didier.Remyinria.fr

of elds plus one. Only the polymorphi reation of

reordshastopaymoreintimeandmemory.

Insetion1,thespeiityofreordsasdataprod-

ut strutures servesas an introdution to the ondi-

tionsforwhihourrepresentationwillworkinpratie.

The method is desribed in detail in setion 2as the

enodingofpartialfuntionsfromlabelstovalueswith

nite domains. In setion3 weextend themethod to

reordswith defaults. These are total funtions from

labels to values that are onstant almost everywhere.

As an appliation we get safe standard reords in an

untypedlanguage. Insetion4,wedisusshowtohan-

dlepathologialasesinordertopreventbadbehavior.

1 Reords and their speiity

Reordsareprodutdatastrutures. Eahpieeof in-

formationisstoredwithakey,moreommonlyalleda

label,that isusedtoretrievetheinformation. Thereis

atmostone valueassoiatedtoalabel. Byeldapair

noteda7!vofalabelaandavaluev. Suhdatastru-

turesareofommonuseinomputersiene. However

ourdenitionofreordsistoovagueforhoosingagood

representationof reords. It is neessaryto knowthe

average and the maximum size of these strutures, to

knowwhether theyare reatedinrementally, the fre-

quenyof the dierent operationsand whih ones are

privileged. It is obvious that dierent programs will

givedierentanswers. Thesequestionsanonlybean-

swered in general, or the answers that we give below

analso betakenasassumptions forwhihour repre-

sentationofreordswillworkin pratie.

Reords are provided with three operations. Cre-

ationbuildsareordwithanitenumberoflabelsto-

getherwithvaluesassoiatedtotheselabels. Extension

takesareordand buildsanewonethathas allelds

oftherstone plusanite numberofnewelds. A-

ess takes areord and a label and returns the value

assoiatedtothatlabel. Itfailsifthelabelisnotinthe

domainofthereord.

Reordshavearelativelysmallnumberoflabels. At

mostaoupleofhundred,onaveragelessthanten. Non

inrementalreationandaessarebothveryfrequent

and are privileged. There are usually many reords

with the same domain. Spae and time are equally

(2)

The simplest representation of reords by assoia-

tionlistsisgoodforverysmallreordsbutitmakesa-

esstolargereordstooslow. Balanetreeswouldhave

betterperformanesforlargereordsbuttheoverhead

hasalsoto bepaid forsmallreords;theyalsorequire

toomuhmemory. Generalhashedtableswillalsohave

anoverhead thatisnotaeptableforsmall reords.

2 Extensible reords with poly-

morphi aess

Inthissetionweonsiderreordsaspartialfuntions

fromlabelstovalues,withnitedomains. Theproblem

istondarepresentationforsuhfuntions,suhthat

under theassumptions of theabovesetion, the oper-

ationsonreordsanbeperformedeÆiently. Byper-

forming,wemeanevaluatingin general. Inpartiular,

thisappliestoompilationwheresomeoftheevaluation

anbedonestatially.

Standardmonomorphi operationsonnon extensi-

ble reords should not be penalized by the introdu-

tion of more powerful reords. Although it is always

possible to keep two kinds of reords oexisting, the

new reordsshould replaethe older, weakerones. It

should be left to the ompiler to reognize that some

reordoperationsaremonomorphi and thus anbet-

terbeompiled. Indeed,thiswillnotbepossibleforall

ompilationshema.

Areord r isa partial funtion from labels to val-

ueswithanite domain(a

i 7!v

i )

i21::n

. Thesimplest

deompositionof thisfuntionis

Labels h

*[1;n℄

v

!Values

a

i

>i >v

i

The total funtion v stores the omponents of the

reord, while the the partial funtion h, alled the

header, desribes how labels are mapped to indies.

Thisdeompositionsuggeststherepresentation ofrin

avetorR :

07!H

i7!v

i

i2[1;n℄

whereH representsthefuntion h.

The partialfuntion h needsto be dened atleast

onthedomainofranditshouldbetterbeinjetiveon

the domain of r too. Any suh funtion would work,

sinevisthendened by

(hjdom(r)) 1

Ær

uptopermutationofindexeswithidentialvalues.

Ifallreordsareodedsuhthattheirheaders(the

representationsmaydierprovidedtheyimplementthe

same funtion) only depend on their domains, then

ompilationofoperationsonreordswhosesetofelds

atingtheirheader. Theaessbeomeasingleindiret

readtofeththevalueofthereordonthateld. The

reationisalwaysin that ase,sineitbuilds areord

with n elds from nothing. The header an be om-

puted statially and shared between all reords built

by the samefuntion. The ost is reduedto alloat-

ingandllingn+1elds ofavetor. Moregenerally,

headersanbesharedbetweenallreordsthathavethe

samedomainbykeepingallexistingheadersin atable.

Polymorphi aess and polymorphi extension

must use theheaders. Forsake of simpliity, we on-

siderthatlabelsareintegers. Theparserandtheprinter

woulddealwiththeisomorphismbetweenintegersand

namesin areallanguage.

Finding a good representation of h seems as diÆ-

ult asnding agood representation of r. There are

twodierenes,though. Sinetheheaderisshared,we

are allowed a little more exibility on the size of H.

Also,hisafuntiononintegers,thusweareallowedto

usearithmeti and logi operation on integers. There

is no hope of nding a diret representation of h by

arithmeti operations, sine its domain is ompletely

arbitrary. Atleastsomemappingbetweenintegershas

tobeanarbitrarymaprepresentedbyavetorofinte-

gersforinstane. Amixeddeompositionoftheheader

his:

IN

( modp)

![0;p 1℄

*[1;n℄

where( modp) is injetiveonthe domain of r. Suh

a deomposition is always possible, to the prie of a

higher p. In pratie the smallest p that works is on

averagetwiethesizeofnforafewlabelsandthreeto

fourtimesfor largersets oflabels. Sinethe headeris

shared,thisisveryaeptable.

Theinterestofthisdeompositionofhisthatitan

beompiledeÆientlyandodedinavetorH:

07!p

j7!(j 1) j2[1;p℄

The partial funtion must be extended into a total

funtionon[1;p℄. Wewrite ittheunonstrainedex-

tensionof.

In too ases below, We will also be interested in

two partiular extensions of below that we write ^

and . The former ^is anextension of with values

of[1;n℄, thus it makesr atotal funtion. Thelater

extendsoutsideof[1;n℄,forinstane0,whihprovides

amembership testto thedomain of bytesting for

equalityto zero.

ThedomainofrmustalsobeodedinH forpoly-

morphi extension. All its labelsan be listed at the

endofH.

8

>

<

>

: 07!p

17!n

2+j 7!(j) j2[0;p 1℄

1+p+i7!a i2[1;n℄

(3)

thereord,andonsequentlytheheader,arestatially

known. Suhinformationisexpetedtobefoundbythe

typeheker. Thisannotalwaysbethease,however.

Inorder torely onthetypes toknowthedomains

ofreords,theattendanesofelds, givenbythetypes

of reords, must orrespond exatlyto their domains.

This impliesthat the restritionof areordon aeld

modies its header, sine its hanges the attendane.

Thisis onepossiblesemantisfor restrition. Another

oneistotaketherestritionofeldsasaretypingfun-

tion, that is, afuntion that evaluatesastheidentity.

The hoie is between an expensive ative restrition

thatallowsaessoptimizationsoraheapretypingre-

stritionsthatforbidsthem.

3 Reords with defaults

Inthissetionweonsiderreordsas partialfuntions

fromlabelstovalues,onstantalmosteverywhere. The

problem is now to reognize whenever the eld does

not belong to the domain of the reord(wemean the

expliitly dened values, here), in whih ase the de-

faultvalueisreturned. Themembershiptestmightbe

expensivein timeorin spae.

There isa heap solutionbased onthe sameteh-

niqueasabove. Infat,weodedreordsbytotalfun-

tiononlabels,and desribedthedomain separatelyin

order to implement the extension of elds. Thus we

ould apply them outside of their domain (but get a

valueofunpreditabletype).

Letr be areord. Considerthereord r 0

equalto

id j

dom(r). An arbitrary label ais in the domainof

rifandonlyifitis equaltor 0

:a. Theauxiliaryreord

r 0

onlydependsonthedomainofrisalreadybeoded

in theheaderH asthedomain ofr. Rememberthati

isthe indexin R where thevalueoflabela

i

isstored.

Thusit istheindex wherea

i

isin R 0

, whihis alsoin

H atposition2+p+i, providedi=(a

i

modp).

WesimplyshifttheindiesinRtoplaethedefault

valueatposition1:

8

<

: 07!H

17!d

1+i7!v

i

i2[1;n℄

Weomputethe appliation ofr to thelabel aasfol-

lows. First ompute the index i assoiated to a, that

isH:(amod(H:0)). IfH:(2+p+i)isequaltoa, then

the label belongs to the domain ofr and the resultis

R :(1+i)otherwiseitisthedefault R :(1).

One must be areful to use the ^extension of .

The extension is still possible, but the above mem-

bershiptestmustbepreededbyamembershiptestto

thedomainof,andinaseoffailurethedefaultvalue

shouldalsobeused.

In fat, the enoding of reords with defaults an

n 3 5 8 11 23 30 40 60 100 200

p

ave

4 9 18 30 86 132 207 400 902 2565

p

max

11 18 29 48 148 206 298 576 1195 3053

Figure1: Averageheadersize

an untyped language: a dynami type error is raised

whenthemembershiptestfailsinsteadofreturningthe

default.

4 About eÆieny

Thissetiondoesnotonsidertheaseofreordswith

defaults for sake of simpliity, but it an be adapted

veryeasilyto them.

The previous representation of reords is very at-

trative sine it implements linear time aess, uses

very little memory, keeps the same performane on

monomorphireords asif all reordswere monomor-

phi. However we must hek the following points.

First,thatthesizeoftheheaderdoesnotgettoolarge.

Seondly,thatthepolymorphiextensiondoesnothave

toobad performane, eventhoughit isnotprivileged.

Last,that pathologialasesanbehandled.

Theomputationof theintegerpisat theheartof

everyquestion. Theproblem isgivenaset of integers

D,ndasmallintegerpsuhthat( modn)isinjetive

onD. Theintegerpdoesnotneedtobeminimal,even

ifomputedatompiletime,sinealargerheadermight

make polymorphi extension more eÆient. However,

the minimal p gives alower bound on the size of the

header. ForinstaneifD israndomly hosen,andthe

setoflabelsislargeenoughinomparisontothesizen

ofD,theprobabilitythatpdisambiguatesnintegersis

p!

(p b)!p n

Theformulathat givestheaverage smallestpin fun-

tionof nis simple, but gure1givesan experimental

resultontheaveragep

ave

andthelargestp

max foron

ahundredrunsperolumn.

Forsmallreords,10 4

runsdidnotgiveverydier-

entmaximalp. Thedispersionofpisshownbygure2

Theguresshowthatunder30labels,thesizeofD

doesnotexeeds,inaverage,four timesthenumberof

elds, and exeeds veryrarely twie the average. For

largereords(above50elds)theheaderbeomesvery

large. Itislearthatanothersolutionmustbeapplied

forlargereords. Evenifonewishedto pushthislimit

to a hundred labels, there is always a rank that an

be reahed in pratie (even if it is pathologial) for

anotherrepresentationshouldbeused.

(4)

10 20 30 40 50

200

50 100 400

Figure2: Dispersionoftheheadersize

Sine, we annot avoid a mixed representation, if

we want to handle large size reords in order to rep-

resent them with reasonablesize headers, weuse tags

todistinguishbetweenthetworepresentations. Forin-

stane, negative integers an be used as tags for an

alternaterepresentations. Therearemanypossibilities

for the alternate representations, and sine these are

pathologialases, in thesense that theydo notmeet

therequirementthatwesetinsetion1,wedonotare

muh about theeÆieny of thealternate representa-

tion. We propose, twopossible solutionsthat t well

withtheregularrepresentation.

Therstsolutioniswheneveraandbareequaltoj

moduloptoassign(j)withaninteger qsuhthatq

isin[2;p℄andmoduloqdistinguishesaandbandwith

valuesthat arenotin theimageof.

The seond solution replaes perfet hashing by

hashingwithlinearprobing([Sed88℄,Chapter16). The

headerisH is

8

>

>

>

<

>

>

>

: 07!p

17!n

2+j7!(j) j2[0;p 1℄

1+p+i7!a

i

i2[1;n℄

2+n+p+j7!a

(j)

j2[0;p 1℄

forlabelsthatdonotonit. When2 + n + p + (a

i mod p)

isalreadyoupied,thelabela

i

isplaedatthesmallest

freeposition after,sayj,and(j)islledwithi.

The rst method still gives aess in linear time,

butheadersaremorediÆulttoompute. Themaybe

muhfaster onaverage,but requirelargerheaders. In

bothasesthereis alot offreedomonhowtohose p,

aordingtohowmanyonits areaepted. Letting

p beabout3 times the size of r may avoid searhing

for optima while limitingthe number of lashes. The

rst method is more exible, sine a onit for one

labeldoesnotneedtodouble thesize oftheheaderor

reompute anotherheader: itsimply usestheholesof

the atualheader. Then, it an also be used even for

averagesize reordsin someases in order to ompile

moreeÆientextension.

The eÆieny of polymorphi extension has not

been onsidered yet, sine it was nota privileged op-

tionaltothesizeofthereorditextends. Thetimefor

omputingtheheader now beomes important. There

aredierentases(weexludelarge size reords, that

shouldberepresentedotherwise):

Thereisnoneedtoomputeanewp,

Theaverageaseforomputinganewp,

Theworseasesforomputinganewp.

In the average ase, omputing a new header means

that3dierentp'smustbeprobedperextension,sine

headersarein average3timesthesize ofn. Theopti-

mistiunitostU istheoneofaloopthatontainsat

leastonevetorwriteandonemoduloinstrution. The

ostfor aprobeis pU,sine thefailure is probableto

happen attheend. Thustheaverageostfor reating

anew header is 3pU plustwo other pU for llingthe

header.

However,areordwithaveryompatheadermay

beextendedwithalabelthatwillmaketheheaderget

loserto itsaveragesize. Then aboutnU probesmay

beneeded,makingtheostforthenewheaderinrease

topnU.

Ontheotherhand,itisveryprobablethat theex-

tended header alreadyexists oris trivial. If the label

a of theextended eld is takenat random, there is a

probabilityof(p n)=pthatthen+1eldswillstillbe

disambiguatedbyp. Inthatase,theostforreating

anew header is thesame asopying theold one plus

modifyingafeweldspU.

Thereationofanewheadermaybeavoidedmost

ofthetimesineitisveryprobablethatithasalready

bereated. Thisrequires that all existingheaders are

storedin aditionary. This willsaveboth spaeand

time. The keys in the ditionary are the domains of

headersthat anbekeptorderedin H, so that equal-

ity tests are not too expensive, and the whole ost of

searhingwouldbelowerthantheminimal ostof ex-

tension.

Ative restrition of elds an be implementing

alongthe sameideas. The header is lookedup in the

table. If it does not exist, then the new header need

notbethe smallestone,provided that it is ofreason-

ablesize. Thisavoidstheexpensiveostofndingthe

smallestinteger modulo whih all elements of the do-

mainasdistinguished.

5 Other ompilation shemas

Therearethreedierentideasintheaboverepresenta-

tionofreords

1. Thevalue of elds and the position of elds are

representedseparately. Theheaderthatdesribes

thelaterissharedbetweenallreordshavingthe

(5)

mainalwayshavethesameeldsatthesamepo-

sition.

2. The header anberepresentedby amodulo fol-

lowedbyaprojetion.

3. Dierent representation of the headers an live

together.

Therstpointisruialinourrepresentation. Anyrep-

resentation of reordsthat does notoriginally respet

thispointanstillbeusedtoimplementheaders,then

values an be stored in vetors as above. Sharing of

headerswillsavethelargeamountofspaerequiredby

assoiationlistsorbalaned trees.

The representation of the header itself is not im-

portant. Wedesribedonepossibilitythatisveryon-

venient forsmall andmedium size reords. But many

otherrepresentationsarepossible. Wehoosetorepre-

sentthe headerasastruturethat isinterpretedboth

byextension andaess. It ouldalso bealosure for

theaess part,together with adesriptionofthe do-

main that isneeded fortheextension. This isthe tag

vslosureduality.

If the extension and the restrition of elds are

themselveslosedwiththeheader,reordsouldreally

beviewedasobjetswith twomethodsfor aessand

extension.

Conlusion

Wehave presentedawayof representingreordswith

orwithoutdefaultsthatallowseÆientaessandre-

ation. Only the more powerful features suh aspoly-

morphi extension, ortrue restritionofelds haveto

payahigherprie.

Our representation of reord with defaults an be

usedtoimplementsafeaess inanuntypedlanguage.

An orthogonalappliationouldbetherepresenta-

tionoffeaturetermsthatareveryrelatedtoreords.

Thanks

TheseideasoriginateindisussionswithXavierLeroy.

Theyhavebeenmentionedforthersttimein[Ler90℄

andtestedintheuntypedversionofZin,theanestor

ofCaml-Light.

Referenes

[HMT90℄ Robert Harper, Robin Milner, and Mads

Tofte. The denition of Standard ML. The

MITPress,1990.

[Ler90℄ XavierLeroy.TheZINCexperiment: aneo-

BP105,F-78153LeChesnayCedex,1990.

[Sed88℄ Robert Sedgewik. Algorithms. Computer

Siene.Addison-Wesley, seondeditionedi-

tion,1988.

[Wei89℄ PierreWeis. The CAML Referene Manual.

INRIA-Roquenourt, BP105, F-78153 Le

ChesnayCedex,1989.

Références

Documents relatifs

Countries in the African Region have made more progress over the past 10 years but are still not on track to achieve the health and health-related MDGs despite the

While manifest cluster-mean centering is known to introduce Nickell’s bias, we also showed how some implementations of the latent mean centering approach in the Bayesian framework

The fact that all fields have class number one is enough for the discussion of quadratic rings in §5; for the statements of descent in §6 concerning minimal fields of definition

Government should vigorously develop small towns characteristic economy to create more jobs, introducing from industry with a certain specification, constructing industry and

Then, the author proposed a complete type inference algorithm for Wand's system [Rem89], but it was formalized only in the case of a nite set of labels (a previous solution given

This will, however, not always be sufficient to ensure safe and effective pain management and two alternative scenarios should be currently considered: (1) a personalized

To this aim, the present study compared the performance of right hemisphere damaged (RHD) patients with and without a deficit in processing the contralesional space called

Next, we classify current and possible approaches to a solution for mobility (section 4) and finally (in section 5) we describe how mobility can be implemented using the