• Aucun résultat trouvé

Inherent and

N/A
N/A
Protected

Academic year: 2022

Partager "Inherent and"

Copied!
117
0
0

Texte intégral

(1)
(2)
(3)
(4)
(5)

Dependence Analysis and Evaluation of Inherent Parallelism of Programs

hy

ZhenjieChe ll

A thesissubmitte dtothe Schoolof Gra duateStlldil~

in partinlfulfillmentofthe rcnnircmcnts forthedegree of MasterofScience

Department ofComp uterScience Memorial Unive rsityofNewfoundland

Decem ber lO9:;

St.John's Gallat In

(6)

. ..-.

Nationallibrary ctcenaoa AcquisihOflSand BibliographicServices Blanch mWell"'\lk>1S1roe1

~~~AOnla,()

Directiondesacquis~ion~at des sevces btbbographlQues 39!>,.uuW9!lngton OlIawaIOnlan:l) KIA 0N4

The author has granted an irrevocable non-exclusive licence allowing the NationalLibrary 01 Canada to reproduce, loan, distribute or sell copies of his/her thesisby any means and In any form or format, making this thesisavailable to Interested persons.

The author retains ownership of the copyright in his/her thesis.

Neither the thesisnor substantial extracts fromitmay be printedor otherwise reproduced without his/her permission.

L'auteur a accorde una licence Irrevocable et non exclusive permellant

a

Ia Bibllotheque natlonale du Canada de reprodulre,preter,dlstrlbuerau vendredes copiesde sa these de quelque mantere at sous quelque forme que ce soltpour mettre des exemplairesde cette these

a

la disposition das personnesinteressees.

l'auteur conserve la proprlete du droit d'auteur qui protege sa these. Nllathese nl des extraits substantiels de celle-cl ne dolvent etre lmprlmes au autrementreprodults sans son autorisation.

ISBN 0-612-1 3886-0

Canada

(7)

Abstract

Depende ncies are relations betweenstatements ofa. program. They indicatethe constraintsimposed on theorderof statement execution,and arc oftenusedfor the evaluutio n, opt imization ,vcctorixntionandparallellaatlonof programs.By means of dependence analysis,milchwork hasbeen done to exploit theparallelism In the loops inwhichthere arcno dependenciesthlltcrossfrom one iteration of the loopto another.

Onlyrecent lylUIapproachwas proposedCorexploitingthe parallelismavailableinloop s witheross-ltcratiou dependencies.

The aim ofthis project is to evaluate inherentparallelismof seque nt ialprograms hyIILI'an!>ofdependence analysis.Thethesis firs t intro du ces the dcfinit;ulIH, concepts audhasic dependencyanalysisalgorithms ,nnd presents a uniformrepresentation of depundcnclcscalled;\ dependence graph.Then,usingthe dependencegraph, ageneral approachis presented whichcan be usedto analyze the parallelism betweenloops as wellasbetweenloops and otherpartsof a program.This approach wasimplemented as;: prognuu calledDSA(Dependence and SpeedupAnalyzer),usedto performthe dcpondonccanalysisand to evaluatethe inherentparallelism ofFortran programs .Ft- nally, the implementati onof DSA isbriefly describedanditsusc is illustratedby a seriesofoxamplcs.

(8)

Acknowledgments

1 wishto expressmythan kstoIllysllpl~rvi1'iOr,Prof.wlodekr..1.ZlllM' rt'k. forlib guida nce,inte rest,suggestions,patien ce"lid finnncinla....sistnncc duringlllr stUtlit'l'at MemorialUniversity ofNcwfonudlaud.l Icnmcd a lot (miliumnnuu'roustlili('ussillllS.

Hehas contributed significantlytothe qualityof this thes is.

1 alsowish tothank myotherinstructors:Prof..lohnShit'h, Prnf.TOllY f\litldh\lnn, Prof.Krlshnamurth yVidynsankar,Prof. Caonn Wang ami Prof. Xinobu Yuan.

Iamvery gratefultothe administra tiveandtcehnlcal sta ff who111Wj~hdpl'llillow' way oranotherinth e preparation of thisthesis.:;pccinlthanks arethw t.uMs.Elniru- Boone,Mr. Michael Rayment,Mr. Nolan White,Mrs.Putriein Murphy,und:-"Ir~.

Jennifer Cutler for theirhelp and assistance .

1 wouldliketo thankDr.SiwciLu,Dr..liau Tang,Mr.HanChell,Ms.Xi Lu,Mrs.

ZhengqiLu,Mr.Donald Crai g,Mr.ZhihougYuan,Mr.YIl/!;lll\llgGheUIlI111l\h.Mill/!;

Thn for their valuablecommentsand anggcarions.

1wouldalsoliketo thankmanyot hers netnamedherewhoJlrovidc~i1eucourngcmcnt nndassist anceduringmygraduate studies.

Finally, Iwouldliketo thanktheSchoolof Graduate St,lldit,;amit1U!l)(~partllwllt.

ofComputerScience ,forproviding,toget her withDr.W.M.Znlwrek, finnneinlsupport duringmystudies.

II

(9)

Thi.,the.,i3i,' dedicatedto myparents

fortheir.9U1Jport andencouragement throughoutmyedlJcation and to

my wife,BingMi,anddaughter,XiaoyueChen , for enduringthe hanJ.,hipofthe past three yeaTS.

III

(10)

Contents

1 Int roduc tion

1.1 Busic Concepts 1.2 ThesisOverview. • .

2 Dep e n de nceAna lys is 2.1 ControlDependenceAnalysis 2.2 Da taDependence Ana.lysis. 2.2.1 Definitions•.. ..

2.2.2 Global DataFlowAnalysis. 2.2.3 ArrayElementDependence'1'c!1tillg. 2.3 Alias Analynis .

2.3.1 DetectingTypc-lAliases 2.3.2 Detectin g'Iypc-z AIin.<;C5 2..1 DependenceGra ph

3 Evalu atio nof Inher ent Para lle lism 3.1 Evaluation ofT.cr i41

3.2 Evaluationof TpcIT41lol 3.2.1 EvaluationofT, .

IV

III

1:1

...

17

27 :m

:il

31

(11)

3.2.2 Evaluati on of1~'''1I,1. 3.2.3 Discussion.

4 Implem en t a tionofDSA

4.1 Over view ofDSA. 4.2 Progra mAnal~is.

4.2.1 Generation of ControlFlow Graph. 4.2.2 Calculat ion ofIN andOUT Sets.

,1.3 DependenceAnalysis... ....•. 1.3. 1 Globa l DataFlowAnalysis. 4.3.2 ArrayElcmcntDependenceTcstil'g. 1.'1EvaluationofthcSpeedupFactor .

1.'1.1 EvaluationofT..".;111 '1.-1.2 EvaluationofT".,.rG/ld

47

47 ,18 50

5 '

55 56 60 61

6'

63

5 Ex amples

5.1 I=;xllmple 1 ,.,

5,2 Example2. . 5, ~t Examp le3, 5A Examplc-t.

5.5 EXnUlph!5.

(j Conclusions

Bibliography

Appendix

v

67 68

70 72

. , , 74

. .. ,.... .. 75

78

86

87

(12)

AIterat ion-Recursion Algor it hms fe r GlobalDatu FlowAnalysis 87

A.l Background 87

A.2Itera tion-Recursion Algorithms !II

A.2.1 HechtandUllman'sIterativeAlgoritlnu A.2.2 It eration-RecursionAlgorithmnile A.2.3 lteration-RccursionAlgorithmTwo . A.2..1ltcration-Rccursiou Algorithm Three A.3Conclusions. .

VI

!J!)

(13)

List of Tables

2.1 Contra{depende ncies (or Figure2.1.. 14

2.2 Type-Ialiases forthe example program. 29

2.3 Typc-2nliasos fortheexample program. 31

2.4 Complete(Typc-landTypc-2)aliases for theexampleprogra m. 32

4.1 The(lefandU5Cvalues(orthe exampleprogramofSection2.3.. 55 4.2 The setsINandOUTfor the exampleprogra m of Section2.3. 55

5.1 1~'_'T;"h 1~r..Il"lindthe speedupfactorforLoop 20. 7-'

5.2 1~""i"J,T""."Urland the speedupfactorforLoo pH. 76

VII

(14)

List of F'igures

1.1 Dependence graph.

2.1 Thecontrol flow graphandits post-dominatortree.. 1:1 2.2 Hierarchy ofdirection vectors fortwoloops. :H

2.3 Thedependencetree fortheprogram.. 2·'

2,4 Thebinding graph{Jfor the exampleprogram. . . 28 2.5 Thecomplete worklistnof theexampleprogram.. :n 2.6 An example program and itsdependencegrnph. :12

3.1 The dependencegraphfor nodeHi. :17

3.2 The example program Jfmc!itsdependencegraph. :18 3.3 The example program 2 and itsdepundeucegraph. ,11 3,4 The example program 3 and itsdependenceJ,'faph. ,la 3,5 Theexampleprogram4.and its dependencegraph. 44 3.6 The sum ofan arrayanditscalculat ionill a pnralloltnnchine. 40

4.1 ThedependencegraphfornodeS" • .. &l

A.I Flow graph of a loop in a"structured" program .

VIII

!J!j

(15)

Chapter 1 Introduction

Theneed forcomputingpowerhasbeen growingsteadilyin the last two decades.

However,with theslowing rateof improvementsinsemiconductor technologies, the processing ability ofeingle-ptocessnrsystemsandsequential processingofprograms arc rcadJingtheirlimits. In turn,this hasbeen stimulati ngresearch inmultiprocessor architect ures and parallel algorithms.

Milchworkin the area of exploiting the parallelism of programs and programcon- versio nfrom sequentialto parallel Conn has beendone on thebasisof dependence nna.l.ysis. Severalexperi men talcompiling systems exploitingparallelismin FORTRAN

programshavebeendevelopedusing dependen ce analysis(2,3,7].However, thesesys- ternsperform parallelizat lon in a ratherlimited range,often analyzing onlythe DO constructe ofFORTRANprograms,whilethe parallelismbetween other statementsis ignored.For example,Gupta and Soffa118, 19]developed specialised compilationtech- niqucs which canbe used todetect parallel operationswit hinandbetween sequential stat ements: their workwas donefor theRcconfigurableLongInstructionWord (RLlW) architecturemodel,andtheincreasedcomputation speed was obt ained by matching an applicationprogram tothe part icula r RLlWarchitectu re instruct ure andinsize. In

(16)

meregeneral approaches,the evaluation and detection of parallelism should berunchitu- independent [·17)_Recentl yLilja[32] developeda methodcalledcrit iclIItl('I!I~,"llmrl~rn- tioto determ ine maximum possibleparallelismfora loop, given unlimited hardware resources.However,the method cannotdeal with pnmllelisra between100f\~or between aloop and theother parts of a program.

Theaim ofthisresear chistouscdependen cennnlysiafor cvnluurion ofinherent parallelismofsequentialprograms .Depen denciesarcrelationsbetween statementsofII

program .Theycanberepresented asagra ph, callednpro,lJTYIITIrJl~1Iellllf1lwf)grn"h,nnd arc widelyusedfor performingprogram optimizations, vectorization ,andpnrnllelizntlnn [20,31,3-:1,29,30,151.Ithas been shown thatifthe programdependencep;rnphsof two program sareisomorphic, thenthe programsarcst ronglyeqnivnlcut ill thesense of theirbehaviors[23].Thisequivalence canbeused to determin e the maximallyparallul execut ion ofthe program .Such anapproach isthemoti vationforthis work.Ther('.sult.s ofthis researchcan beused inautomatictransformation ofscqncnt iulprograms1,0their equivalent parallelforms.

There arc two main contri butions ofthis research. First , IIgeneral approadlto findin gthe maximal possible parallelismof asequentialprogramis proposed.Secondly, the proposedapproachisimplementedas aprogram called DSA(Dopendenco and SpeedupAnal yzer ).Theprogram performs thedependenceanalysisand thenvaluntlon of inherentparallelismof FORTRANprograms.

1.1 B asi c Concept s

Dependenciesariseas the resultof twosepar ate effects . First,a dependence) existJ!

between twostate mentsS;andSjif bothstat ements accessthe snmc memoryloentlon

(17)

(at 1(~iL<;tone ofthem must write this location)andno statementbetween Siand Sj writesthis locution.Dependencies ofthis typeare calleddatadependeneie.~I.There arc three typo...ofdata dopondonclcsbased111)Onthe waysin whichS.andSjaccessthe locu tion. StatementSjis

fl()lII-(l(~W~lIIlelltonSi,if Sj writes1'1memorylocationandSjreads it;

•Ilnti-depelldentonSi'if Sireadsa memo ryloca tion andSj writesit;

•QutTltd-rlr:pf:n rlcntQnSi lifS.writesamemory locationandSjwri tesit again;

Thememoryloca tion can correspondtoa scalar variable or anarr ayelem ent.

Secondly,a depend enceexists betweenll.statement S and apredicateBwhose value (d irectly) controlsthe executi onofS.Dependenciesofthis typearc calledcontrol rll:fw"rlclll:ic.~2.For example,illthe sequenceofstatem ents :

51: IF(B)THEN 52: X",Y+W

53: Z=X*A

54: A-C-D

55: Z=E+F

56: ENDIF

'l'hestatements 82,83 ,84and 85are control-dependenton the predicateB;in othor words,82•Sa,84and85arccontrol-dependenton81,83is flow-dependenton 82due toX,84is anti-dependent onSaduetoA,and 85is output-depe ndenton$3 duotoZ.

Depende nce nnnlysla detectsthe dependenciesinaprogram.Ferran te(15] has made allexcellent contributiontothe analysisof controldependencies. Data depen dence

IAfQrlnlll rlefiuition of dntndcpendcnciell Is glvenln Chll.pter2.

.

,Aformaldefinitionof control dependences isgivenin Chapter2.

(18)

analysisismorecomplicate dthanco ntroldcpoudoucoanalysi s hCfallS!! itmusttakeiuto accountdependen cies createdbysubscriptedvariabl es (armyclements) ,andrllin.~e.~,t.e-.

referencestomem ory locatio ns whicharc identifiedbymorethanone identifie r (nliaM's canbe createdbyproced ur epassin gmechanism s Anddataeq uivnlcnces ].Fllrexnmple, inthe foll owingprograms:

sr. DOr"1, 10 S2: A(I)"B(I)

S3: C( I):D(I)

S4: B(I) : A(I+1) +F(I ) S5: ENDDD

itiseas y to sec thnt84isautl-dop cndontOIL82due to13(1).However,S1isalso flow-depen dent on8.1due toA(l) andA(I+ l)whenthe variable1 increasesin repented executio ns of theloop. On the otherImnd, if Gillanaliasof S,thenSlisallt.i-(II~IJ('lult~ll t.

on82,and84is output-dependent011SJdueto C'[I}and 13(1).

Thespeedup factor,usedtomeasure theinherentpnmllelismofproh'l'I UlIS,is IMillPII

spccdUTI=:;:;/:"

whereT....i41isthe time ofthesequentialexecution of a program,nnd1;"'rtlll rlist1w time of themax imally parallel exec ut ionof the sam n program ,i.e.,the

tune

of program executi on withan unlimitednumber of available processors. Thespee dupIuctoreun furthe rbe'specialized'asfixed size,~pr-cdupand.w:aled "pcedTJTJ[Mil.Fhm:l liizcSPI~~ hl l) indicateshowmuch execut iontime can he reducedouaspccific pnrnllelprm:l:Ssor,while scaled speedup is usedinexplorin gthe computatio nal powerof]lllmllcll:fllJlpUI~!rSIor solving otherwiseintractableproblems.

(19)

Fromthe algorithmanalysispoint ofview, thespeedu p factoris defined as [13]

~,called"peefll lpof theauernge executi ontime.~,orM( ~ ),calledaverage speedup, where1'.ami1~nrc random variablesrepresentingtheexecution timeon on e andon 1/processors,respect ively,E(T)is the ex pectedvalueofT,TilandTDarcrando m variablesrepresen tingthe execut iontimeof a sequentialalgorithmAand a paral lel nlgorith mBfor solving the sameproblem, andM(T)canbe anymeanvalue ofT, illpart icula rthearithmeticmean.It shou ldbe notedthat these twodefin it ions also providetwo methodsto calculate theapp roximatevaluesof thespeedup fact or.

Thestandar d definition of the speedupfactor ,i.c.,~,isusedinthis thesis 1.0evnluntcthe inherent parallelism of programsonthebasisofcontroland datade- pendencies.Controlanddatadepe ndencies of aprogram representcontrolanddata Jlowrela tionship.'!whichmusthe respectedby anyexecutionoftheprogram ,whether parallelor seque ntial. Byexamini ngthesedepende ncies,wecan extracttheinheren t parallelismillaprogram nne!evaluate the speedupfactor.

As anillnstrntiou,the followingprogram call be conside red:

51: DO 1"1,10 52: C(I)"A(I )-B(I) 53: D(I )"A(I)tB(I)

~4: E(I)=C(I)tC(I) 55: Fm"D(I)tD(I) 56: C(l)=E(I)tF(I) 57: ENEma

the stat e ments52to56are control-de pen d ent onSIsincethevalueoftheloop index vnrinbleJdetermineswhetherS,toSoareexecut ed.S4 is flow-dep endent onS2due toell),55isflow-depend enton53due to0(1),So isoutput-d epend ent on8 2due to C(l),S6isenn-dopcndoaeon84duetoC(l),56isflow-dependenton 84due toE(l),

(20)

and86is flow-dependenton 55due to F{I).ThCS(' dep e ndenciescallh~rr-preenr.edliS agraph,lISshownin Flgure 1.1.

flow dependenc e: F antidependence ; A OUlput dependence;0 controlUcpcndence:..

datadcpcrnlcncc:

Pigurc1.1:Dependencegraph.

Assuming thnt allarithmetic,logica lor asaignmentopcrntioncallhe(:nlll p l et.( ~ 1ill oneunit of time:

Tm i ll ,=(10+ 1)*1+10 .(t~'2+fsl+t:;t+f'~5+t.~·~)

=lI+10.(2+2+2+2+2)

= III (units).

Ifthe numberof available processorsis un limited,the loopranln:unfolded ililu 10 groups, an dthese10 groupscan be executedillpa rallel. Tofind1~..rnU..h WI'need only to consider the timeto executeone groupn.~nil groups arc ldenticnl.Silll'(~S1 and84have nodepen dence relationwith8Jan dSr"and nllstntc ments8-J.tnSf;lit!'

cont ro l-dependenton81!then82and 84canex ecute inparalle lwlth8:\!lUllSr,. !in is dependenton82,84andSs, so 86llI1IStexecuteafter 82,S1aw lSr,.Therefore:

T...<tl,..,

=

max (tsl

+

ts,\,t.~·.

+

ts5 )+tsn

=

max (2+2,2+2)+2

=fi (units).

So,the speedupfactor is 18,5in thiscase.

(21)

1.2 Thesis Overv iew

Thisthrsisis organizedintosix chapters.Chapter 2 revi ews the research on depen dence mmly'!'isandaliasanalysis. Thea,theconceptsandalgorithm s relatedtocontr oland datadepend e nceanalysisas wellasaliasanalysis arcintrodu cedin detail.Finally, a uniformrepresentat ionofdependencies, thedependence graph,ispresented.Chapter 3 is devoted to theevaluation of inherentparallelism in cluding abriefint roduction10 theevaluatio nof T."ri41nndadetailedpresentation ofageneralnpproac hto evaluate 1j".,...1I~1'Thisapproachisbased onthedepende nce graph of a programandcanbe usedtodealwith theparalleli sm betweenloopsandbetweenloopsandot her blocksof theprogram.Chapter 4describestheimplem e ntation of DSA, a program perfor ming dependence analysis and evaluationof the sp eedupIac ecr,which uses the algorithms andapproach present edillChapters2and3.Examplesand conclusionsarcpres ent ed in Chapters 5 and G,rcspcctlv ely.

(22)

Chapter 2

Dependence Analysis

ResearchOIl dependence analysis has been conductedovertheInst tweutyyearN.Fer- mute

1 1 51

made anexcollcnecont ribution to analys isofcout rol dependencies,uhnructer- izing the controlstructureof programs.Analysisofdntndependenciesismore !Iillien ll thanthat ofcontroldependencies.It has beenshown th atthedetectionof.InlaIh!o

pendenciesamongsubscriptedvariablesis.;uNlt-complctcllrohlclIIl17,36].Tbefirst cont ributionto thedetecti ngofdatadependence amongsubscript edVIl.riahl(~ill.lu l"!to Banerjee [4,5,6].Heproposed aninequalitywhichprovidesa tiufficicnt co ndition fnr theexisten ceofda ta dependencies.TheBanerjee's incqunlitydoc isiolJAlgorith mcanIN' usedtodealwith morecomplicateddatadependencetcstinA" problems,bu titt..U1C1rr co mplex and inefficient.Anot her significan tresultL'IAllenJUldKennedy'sGCDdccisi ulI algorit hm

f3J

derivedfromthenumber theory,which aL'IOprovides"lIlifficientcondit ion fo r theexis te nce of datadependencies.TheCeDdecisi onlI.Igorithmi...flL'ltIULlI cUi- cient forsome special da tadependencecases.Therefore,inpraetice, theCe Ddl~:isinll algorithm is usua lly IISed first .Ifda taindepende nc ecanbefound,thunthetl!ljliuII;pro- cedure is over.Otherwise,theBa nerjee'sdecision algorit h misusedtofur therperform thedata depende ncetesting.A morepracticalsolutio n co mes IromBurkeandCyt rnn'H

(23)

hierarchical dependencetestingalgorithm[9].Jtisusuallyused as atestframewo rk andis combinedwith other testing algorithms,such asthe BanerjeeandCCD deci- sion algorithms. Due to the inherentintractab ilityofdata dependencetestingamong subscriptedvar iables, research on datade penden c e is continuing [49, 37,41, 40,33].

Another factor which makesdatadep endence analysiscomplicated istheexistence of aliasescreatedbyprocedurepassingmechan ism s and data equivalences. To per- form datil depe ndenceanalysis, aliasan al ysisisneededfirs t.In FORTRAN programs, aliasescaused by dataequivalencesarcalwaysdeclared explicitlyby theCOMMON and EQUIVAL ENCEstatements,so detection ofsuch aliasesisqu ite straightforward.

Howeve r,IIIlinter-proceduralaliasanalysismustbe performedtofind the aliasescaused byprocedurepassingmechanisms. A significantworkonInter-proceduralaliasanalysis Wl\.~firstdoneby Ryder[>12). She introducedII,represent ation,called thecallgraph, of thecontrol and data flowin programstoinvestigateinter-proceduralcommunica- tion.Bur keandCytron(0]alsoproposed someme thods toidentifyaliascdarrays, and to propagate in ter-procedura linformation. Furtherimprovement is due to Cooperan d Kenned y {10,11],who presenteda fast algorith mforcompu ting inter-proceduralaliases IU~~I'(1on animprovedcallgraph,calledthe binding gmph.Animprovedversionof the fastaliasanalysisalgor ithmbased on bindinggraphispresented in [38].The newest rcsultis dueto(30].

Thefollowin gfoursectionsofthis cha pterdiscussthecontroldependence analysis, dn1.adepend-nco analysis, alias analysis, and dependencegraph, respectively.

(24)

2.1 Co ntrol Depend ence Analysis

Tosimplify the disc u ssion,itisassu medthat1\programrcrrtainsDulynSlliglllllt'llt~

usedinthesequence,~loctiollanditeration constructs. Thesequence, eelceuouund iter at ionconstructshavethefollowin gforms:

•selection:Bthen81else82endif

•iternti on:fori:=1ton doSenddo

whereBisaboolea nCX\lrCSSiOD,nis1\constantoravariable,lindS.S.andS'}.lin ! assignmentst atemen ts,scouencecons tmcts.selectionconstructsor i tcrat.iOll (·Ol1s tr1l1:t.~.

The booleanexpression Bis called thebrrltlrRconditionof thesele ctioncous tmcr..Till' boo lean expressioni:$n,wh ichis the! condit ionto con t inue the Itcrauon,L~l:nlledlilt' hrn nch conditio nof theiterat ionconstruct.

The controlflow graphGof n programisa directedgrA!lh G

=

(N, E),wheretileNI!L

ofnodes,N,is the set of assignmentsand branch conditionsofsola-tionnnr]itt~rnli(ln constructsinthe pr ogra m,andtheedges,E~NxN,represent possi bletransf cl'Nof con t rolbetwe ennodes.It isassumedin contro lflowgraphsthatnodeswhichre p n')wllL br an ch cond it ions(th eyalwayshavetwo nnmcd leceuccceora) haveIItt,rihll te\ll'1'(tnm) an dF(false )associa tedwiththeout goiugedges.Enehcontrolflowgra phisnll11rnellt.(~1 wit htwo specialnodes:ENT RYandSTOP,whichrep resenttheunlquehegilillill/..lntul ter mination ofprogra mexecution.ENTitY has oneedge labeled'"1''' olltlloing t.othe: first statementor the program and another edgelabeled"F"olltgoingto STOP.

The followingtwodefinit ionswereintroduc ed in[15Jtogetherwith agl:lJl~ralldeuuf analyzing co n trolde pe ndenc ies.

10

(25)

Dr.Jinition[Hi). LetGbeacontrolflowgraph.AnodevinGispost- dominatedby

IInodeII!if everydireetetpathfromvtoSTO P(notincludingv)cont a insw.

IfIIispo., t-dllmina tedby 111,1/1iscalleda po&t-dominatorofu, Note thatthis dr.finitiollof post-dominan cedocs notincludetheinitial nodeof thepath. In particular, IInode never post-domlnetositself.

DeJinition [15).Le tGbeIIcont ro l flowgraph.LetxandybenodesinG. yis crm t rol.rleTJen dcrlton xiff:

1.therecxists /\directedpathpfr omxtoy withanynodeZorp(excl udingx and

y)po se-domlnetod byy,and

2. xisnot post-domina tedbyy.

IIIotherwords,ifyiseon t rol-dep cndent onx ina controlflowgra p h,thenthere mustexistat.lel\8ttwo pathsfromxtoSTOPillthegraph;one inclu d esyandthe oth e rdocsno t.

Drjillition(IS].Le tG= (N,E)be a controlflow graph. Apo3t-do7llinatortree T

=

(N,E )contains the setNofnodesofG, andthesubsetE'oftheedgesEofG such thatif vis post-dominatedbyw,or1/1isapost-dominatorofII,the remustexist n pathfromtntoein1'.

Defillition[151.Le tTbe apost-dominatortree,and0andbtwo nodes inT.A nodecofTiscalledthecommonancc"t vrof u andhifTcontainstwopaths,one from ctoaand th e other frometob.A node IofTiscalledthelefUltcomm onancestorof n nndhif:

• IisIIco mmonancestorof a nndb,and

11

(26)

•there is no other(inTsuch that ( is also n commonancestorof n amiII,nud thereis apa t h fromIto( inT.

Given acontrolflowgraph,controldependenciescall bedetcnuinod illthefnllnwilll!:

three steps[15):

I.Find post-dominatorsinthecontrolflow graph, and construct the pos t- cknuinntu r treeT.

2,FindasetSwhich consists ofalledges(n,II)illthe controlHowgraphsudlt.hat.

there is nopa th frombtoainT(i.e.,bdocs notpost-donunnt ca).No tethnt.ill thisC!lSCthe edge(a,b)mustbelabeledby"1'''or"P ",

3.Foreachedge(a,b)in5,findthe leastcommonancestorIora lllUlllillT.Ithas been shown[15)tha.t either lisRcr l isthepa rent of-niuT.

•IfI~'la,allnodesinthepost-dominatortreeallthe pathfromIItnIJ,inl'liulilll!:

nandb,arc con t rol-dep endent onn.

•IfIis the parentora,allnodesin thepost -dominatortreeliltthe "atll from Itob,in cludin gbbut not I,arecontroldependentall 11.

Forexample,for the followingprogram:

51: IF(A)THEN

52: Y-X+Z

ELSE 53: P"K-S 54: IF (B)TflEN

55: V=P+S

ELSE

56: U..

r-z

ENDIF

l'

(27)

(a) TIleconenlno w graphofthe program.

51

(b)Thepost-dom inatortrcccuheprogram.

Figure 2.1:Thecont ro l flowgraph and its post-dominatortree.

S7: Q..U

ENDIF S8: T-Y

thecon t rol flow graph and the post-dominatortreeareshowninFigure2.1.In this example,8={(ENTRY,SI),(81,8 2),(81,83),(84,85),(SI"86)} . Table2.1shows thecontroldependencies that can bedeterminedby examiningeachoftheedgesin the set8 for thegraphsin Figure2.1.

2.2 Data Dependence Analysis

Dntn dependencies can be createdbysca larvariablesand elementsof arrays. Data dependenceanalysisconsistsof globa ldata flowanalysisanddatadependen ce testing.

The global dat a flow analysisis usedas a frameworkof data dependencetesting to findtherelationshi ps betweeneachpair ofscalar variables or clementsof an array and

13

(28)

Table2.1:Contr ol depe ndenciesforFigure2.1.

conuet

(a.b)in5 Nodesmarked dependentOil Lahcl

(ENTRY,SI ) (5/$2) (S/,SJ) (54,S5) (84.S6)

Sf.58 S2 SJ.54.57

S5 S6

ENTRY 51 51 54 S4

todetermine the typeof potential dat adepe ndencies. For tw osealnrvariables,the datadependencetestingisvery simple. Forarray ele ments, subscriptanalysis must beperformed .Theremainingpartofthissectionintroducesthedefinitions ofdntn depe ndencies ,globaldata flowanalys isandarray clement datadepende ncetcst.ill!-:.

2.2.1 Definitions

IN(S)is usedto deno tethe setsofscalarvaria blesand arrayclementswhosevnl uesnrc read by a statementS.OUT(S )isused todenotethe set ofscalarvariabh!llan d nrray clementswhosevaluesaremodified (or"written"]byastatementS.Forexam ple, for a state mentS:X=Y+Z,OUT(S)={X}.andJN(5)={Y,Z).Notetha tfor1110011:

51: DO 1=1,10 52: X(I)=A (I +l)*B 53: ENDDD

OUT(S2)= {X(I).X (2)...X(IO)).undIN(S2)={ A(2),A(3)•...A(lI ),1.U}.Toaim- plifythenota tion,we writeOUT (S 2)=-[X(I»,andIN(S2)={A(I+l),I,U}.

The three types of data de pe ndenc iesarc defincd asfollows:

Definition. GiventwostatementsS;andSj,5jis

flow-dependentonSl,S.6Sj ,ifthereisavariab lezsuchthatxEOU1'(Stln JN{Sj}andzIi!OUT(Sk),fori<k<i:

(29)

•(nt H-depende n !enS., SiSSj,ifthereis ava ri ablexsuc h th a txEIN(S;)n OUT(Sj)end:t¢OUT (Sd,fori<k<j;

OU 1'(6j)an d:r.f/.OUT(Sk ),fori<k<j.

'I'llsim plifythediscussion,weoft ensaythatstatement Sj isdata-dependentVIIS,.

denotedSi~'Sj,ifS.6Sjor

s ;Js

jorS;6"5;.Also,we saythatstatementSjisindirectly rlnltl-tlcpp.n dcnt onSi,denot ed Sil::!..Sj,iltherearcstatementsSkp ... ,Sk.,n~0,such

WlumxisIIseILhu variable, thedependenciesduetoe canbedetectedbyusi ngdata flowanalys is,whichLqdiscusse dlater.However,ifxisanarrayclement,dependence t.(~stillgiscomplicatedbythefactthatdifferent refere ncestoarray clements mayIICCC.'lS thesame ordifferen t memory locat io ns.Ithasbeenshowntha tthedependence testing pr oblematnnng arr-aycleme n tsis equivalen t to theIn tf'J}crLirsear Programming (ILP) Ilrobleln(36J,whichisanNfs-ecmplct e problem(H,161.

Tosimplifythe datadependencetestin gforarray cleme nts, the testing isoften li mitedto loops,andthesubscriptsofarrayclementsarcrestric t ed tolinearexpressions oftheloopindexvariables. Ifany oneof thesubscr iptsisanonline arexpression,a dependenceisas.su medto ex ist.

VI:jiliit irJfl[361_Forthefollowing10011:

for i.:=L1toVIdo fori2:=~toU2do

for:=L.to Undo

XIf, (7),I, (l) , ...,1,. (7»):=...:= X(9,(7),,, (7 ),..,gm(7 l) enddo

15

(30)

enddo enddo

where7istbeveetor(ir,12...i..).RIlL;and Vi. i

=

1._.,".nreron...tnurs,Rud/;' .'J.j= t,...,m areknown6nearfUDctions.~'Oeleme ntsoftbeRrr R)'Xeredf1'C7I1/I~II~rUlt'r(' existIndevalues~ ,...,I..and

i;....

,~sum tha t

L.

s I'" t. s

VI, ...,!4..:5 '••'

s

V•.

IfS is enclosedinnloopswi thiudic os7=-(ilo",i ..),Sideno tes theill~tlllll:(~I,r 5torthe itera t ion7.Supposethestntcm cnteS;lUld5jnrcenclosedilln hmilliwil.h indices7=(I",,,,i~).LetII.vec t oril'

=

(l/IhVJ2,...,¢~),Wie1<,=,o},i

=

1,2, .•.,11

be call eda direction vector.5Jis depen dent onS;WiUla tlin:t:tiontJCf:lllr'II,dClIo t,('11

5i6"Sj,ifthereexistiterations

J; = (i'" 4,,, ·,t..)

and

7( =

(t;. i;,....()Imf'ht.lml

sit-I"..t.16°SJIi7.;;.~l.and thefollowingillC(lun..litiCJ;holdsimulta ncensly: (I"'II~

4\112i;

Th evector(i'.,

4 ,.. ..

i:)-(~.

I;..

,1.:) iseallcd thedin:ctiondi...t flfU'LF\lrt hl'IIII1Jrl',

directionvectorlitsuch thatI{I= (=.=•..,=,<••,.,...,.)allll'. 'dcnub~'<','='Uf 'o',Ifi=j ,wesaythatS;IsIOop.CJl1r1e..tl-dcpenrlcnt ollit.sdr.

For examp le,inthefollowingprogram:

51: DO1-1,10 52: DO J.1,1O 53: A<I,J)..-S(I,J) 54: S(I , J) "'A(I ,J-l) 55: C(I,J)",cU ,J-l)

16

(31)

56: £NOOO 57: ENOOO

WchaveS.;I.I)dS~I.~lwith direction vector(=,<)and directiondist a nce(0,-1)due IIIarmy A,anddenoteitas83°(=.<)84,aloop-earricd-dependence.85isloop-ca rried.

cll~I)(,lldl'lltonitself with direction vector(=,<)anddirectiondistan ce (0,-1)dueto arrayC.Also,we have

s1

1,ll'J8i1,llwithdirectionvector(=,=)and directiondistance (O,O)duotoarray13.

Aloop-carncd-dopcndo ncemeans thatone sta tementmay store adatuminto n locution011oneitera tionof a1001l,amianother sta tementmay fetchthe datumfrom or store anotherdatum into the locati on onanother iterat ion oftheloop,or vice versa.

SII,wesay a loophiII.carry ingdependen('.eloop ifthe loop contai nsaloop-carried- depe ndence.

2.2.2 Globa l DataFlowAn alys is

Globalrlata flow analysiscan be consideredas thepre-execution process of ascertainin g andcollectiuginforma tion whichis distributed throughouta.program ,generallyforthe purposeof nptiruizing the program.Itiswidely used for codeimprovements suchas 111lal,vsisof liveuses, reaching definitions,availableexpressions,very busyvariables,and

Tileelhuinntiou,or interval,methodsand the iter ati vemetho dsaretwopopular 1I1'llrnadu 'llto global flow analysis

1 1,

27,431. Theelimination methods collectthe iufortuntlonhycontinuing to partiti on the control Bow graph of the programintosub- J.!:mphs,calledintervals,andreplacingeach intervalby n singlenodecontaining the lorulinformat ion forthatinterval,untilthe graphbecomes a single node . The iterative

17

(32)

methods prop agatethe infonn ation byinitia lizing the dnta flow equations tosnle values andthen itera ting theequations unti!a fixed-pointsoluti onisfound .

The elimination metho ds may seemto out-performthe iterativemethods,but,when somepract ica l issues, presented in[21],arcta ken intoaccount,theitcrnt ivc 1lll'thOlIs arc timecom pet it ive withtheeliminationmethods. In additio n,the eliminationnlgo- rithm sareusuallyrat hercomplicat ed toprogram.The detailedcom parison of thelime comp lexitiesofthesetwoapproachescan befound in [8,20,27].

IGldall's algorithm playsanimport ant role in the developmenttlf theitcrnti vrl 1I11{n- dtlunbecause it solves theclass of dataflowanalysisprobl emill a unifiednndgenerul latt icetheoret icfram ework[28].Thefram ework provides11.convenientvehicletollllal~'1.t·

thedetai ledproper tiesof each dataflow analysisproblem.Hechtand Ullmanrefined theKildall's algorithm,and presenteda"dept h-first"version of Kildnll's algori t hm[211, a successfuliter ativealgorit hm.They intr oducedadept h-firstorde ring algorithm for the nod esinacont rol flow graph,and for cedthenodestobeprocessed illtlworder.

Theyalsoprovedthattheiralgorit hmwillfinish aglobal dataflowIllinlysL'Ibefored+2 iterations,wheredisthemaximum numb er ofretreating('A9e.~1in Il cycle-free path of the control flow grap h.

Givenacont rolflow graphG,for eachnode11ofG,in(n]is usedtodenote itsiuput data stream andout[n] itsoutpu t data stream.Theil, Hecht awl Ullman'sil.t!rntiv(!

algorithmperforms global data flow analysisillthe following two steps:

1.To eachnoden inG assignan integernumb errPo.'tordf~r[1t1and letOltt[lI]

={].

rPostorder[n] is produced by a depth- first orderillwhichthe node11isnlwnys visited before its successors except when thenoden anditssuceesscrformII IA£ormaidefinlllono£ ret realiflgedgei.,given inChllplcr >l.

18

(33)

retreatingedge.

2.Perform the following iterationnntil no changeis made to any node inG where fn(x)isadata flow function of thenoden (the data flow functionsfare different for the different applications of data flowanalysis,suchasReachin g Definition or LiveUses ):

foreachnoden inG,inorder of rPostorderdo in{II]:={};

foreachedge(p,n)inGdo in[n]:=in[n]U

outlPl

en ddo;

out[n]:=fn(i7l[nJ) end do;

Amore detaileddescription ofthe algorith mand the dataflow functionsfn(x )is givenin[1,8,26, 27,28,43].

DSAuses an iteration-recursionalgorithm,designedfor globaldata flow analysis.

This algorithmperformsa recurs ivetraversal ofthe controlBowgraphofaprogramin everyiterationuntilit terminates . Hecht and Ullma n'sdepth-firstorderingalgorithm is alsousedin thisalgorithm.These algorithmsaredescribedin detailin Chapter4.

2.2.3 ArrayElementDependenceTesting

This partoverviewsalgorith ms which performthesubscriptanalysisof thetwo clements ofAllarra yto findthe dependence between the two elements.Allenand Kennedy's

ceo

decisionnlgcritbm,Ban erj ee's inequalitydecisionalgorithm,andBurkeand Cytron 's hierarc hicaltestingalgorithmarcbrieflydescribedin this sectio n.Moredetailedinfor- mation callbefoundin[3,·t,5,6,9,17,36,4 81.

19

(34)

GOD Deci sion Algori t hm

Let 51 andSIbe enclosedin nloopsfLSfollows:

fori.:=L.toU\do for i2:=1,toU2do

for in:=LntoUndo

S, X(...,/(7), ...j,~..,~X( ..,g(/) , ..) enddo

enddo endd o where7

=

(il> ..,in)and

Then, theIoU/-upbound malrix LUisdefinedH..'1:

and theroefficient matrix Cofthe funct ionfand.qis definedf1.'1:

Forexample,ill thefollowingprogram : 51 : DO1" 1,10

S2: DO J=2,20

S3: A(20*I +J - 20 )= B( J,I)

S4: CU,I)=A(20*I+J -2U

55: ENDDO

56: ENDOO

20

(35)

the low-lipbou ndmatrixis:

W

-

_ (I IO )

2 20

lind the coefficientmatrix

IfS~is data-dependenton81,thenthere mustexistintegersi~,'"i~andi';, ..,tn suchthat

!(i;,...,i~) = g(i~ ,...,i~ ).

That is,

which1~IUlhe rewritt en

Thishas aninteger solution only whenthe greatestcommon divisor of all the left-hand coefficients divideevenlythe integer differenceontheright-handside, thatis,

This is theAllenandKennedy'saCDdecisiontest.

111 pra ct ice ,theaCDtest is relativelyineffective ,because inmost casestheloop index multipliersIlk

=

bk=1,so the ged is1.However,itisuseful insomecases such

21

(36)

81: DO1-1,10 82: X( 2-U- ..-X(2- I+1) 83: EJiDDO

Heregcd(2,2)=2,and because itis not adiv L'iOC ofbo-'10

=

I.th('«'mil he'IIlJ datadependence.

Ban erjee'sInequalityDeci sionAlgorithm

Banerj ee's inequality decision algorith m depends011thedefinit ion uftht·ptl:'iitivt'nllt!

negativepartsofanumberasfollows:

Definition(3J.Lettbe nil intege r.TheIJO.,itive partof tileintegert,I.' .Hlltl lll·!/t1ti 'lf:

part,t-,nrc defined1\.'1:

t+={: if t~0

o

ift-c0 t-={- t

o

~f t:S1ft>O.0

fhis data-dep en denton51ani)'iftherecxi!;t intege rsII'....

i'..

MIt!i';,..,

t..

Iitwllthat

If.foradirection vector~

=

(l/11t1/12, ..., 1P..), alowerboundLI1andanlIPIM:r hUlllul UBcanbefound such that ,foreach termk=1•...•u oftill:ahoveslim:

where:

ift/J",='_'then:

LB;=(a;-btJ (U",-L",l+(a",-6",)L", UB;

=

(at -b;)(U",-L",l+(u",- b",)L", if1/J",= 'c' then:

LB:

=

(a;·-b",)-(U",-L",-1)+(a",-b".)L",-b.~

US:

=

(a;-b",l+(U",-L",-I)+(a",-b",)L",- h", ift/J",

=

'='then:

22

(37)

LBk'"

=

(ak-bk)-(Uk-Lk)+(Uk-bk)Lk UBi

=

(Ilk -lJk)+(Uk -Lk)+(Uk-bk)Lk if Wk='>'then:

LB:=(Ilk-b1 )-( UI; - LI;-1)+(Uk-bl;)Lk-bl;

UB:

=

(uJ;: - b;)+(UJ;:- LJ;:-1)+(uJ;:-bJ;:)LJ;:-bJ;:.

Slit/tillingup these quantities gives thelower andupper bounds,so:

E

LBt -$

t(tld:.-

bJ;:i:)

s f:

UBt·

J;:",1 J;:",l i;: ]

which callhe rewrittenas

t LBr s bq -ao:S E vnt·

k: ]

,,= .

Ifil.rnnbeshownthatif eithe rEl;..lLBt ·>bo-UoorEk=1

vat·

<bo-flo,then thereis110depe ndenceundertheconstraintsofthedirect ion vectorlV=(l/h.!h, ..,1Pn).

Hierar chicalDepen dence TestingAlgorithm

BurkenndCytron im proved the

ceo

andBan erjee's inequality decision algorithms hy introducinghierarchicaltesting.Hiera rchical dependencetestingproceeds froma gcnornl directionvector('.') tomore specific direction vectors('<"'='or'>'}.If, at any step,lUIindepe ndence canbe shown,the directionvectorneeds not berefined further.Othe rwise,if the directionvectorcontains any'.'element,'",'isrefinedto'<', '='or'>',andthetesti ng continues.Ifthedirec t ion vectordoesnot containany'>' dement, the existence of dependence is assumed under the constraintsof the direction vector.Thus, the dependence testingisdoneonahier arch y ofdirection vectors.

Such n hierar chyfor twonestedloops is shown inFigure 2.2.

Considerthepreviousexample:

81: DO1:1 ,10

23

(38)

( ' , ' j

.i-> I ____________

«,'") (=••) (>••)

/1\ /1". /1".

«.» «.=) «.»

(=,» (::. :: )(= (>.<)(>. =)(>. » Fib'1IrC 2.2:Hierarchyof directionvectorsfortwoInoJl'~'

Figure2.3:Thedependence tree fortheprogram.

52: DO j"2,20

53: A(20*hJ-20)=B (J ,I)

54: C(J,!)=A(20*I+J -2l)

55: ENDDD

56: ENDDD

In this example.L1

=

1,~

=

2,VI =10,U2

=

20,flO=-20,(It

=

:W,ft~=I, bo=--21,bl

=

20 andlJ.l=1. UsingBurke and Cytron 'shil~mrdliealdCpl!lldl!Ill 'I!

testingalgorithm as the testframework, and Banerjee's inequality ,lcdsiolJaljl;orie,lull to testwhetherthereexistsan independencenudertheconstmints ofthedirection vector,wecan obtain thedepe ndencetree shownill Figure2.3which lndk-ntostill' datadependence8,1084,Moreover,thedirectionvector(=.<) illdklltm;thutthe rlutu dependenceS3CS4isa Ioop-cerried-dcpondonce.

24

(39)

Another cont ributionof Burke and Cytron[9} is the lineari zation ofarr ayreferences, which rol ueesthe complexityof depende nce testing .1£an arrayAisdeclaredas

A(L,,U" "" L", U.)

tnenn referencoto anclement A(dJ, ,dn)can he linearized asA'(f (d"".,dn,Lit...,Ln_ll

Uh•..,Un_I)),wheref(d\ , ..,dr.,LJ, ,L n_hUI! " " U,,_,)is thefollowinglinearexpres- sion:

..

.-\

I(d".."d..L".."L._"U","U._,)~1+L:« d; -L;)

II(U,-

Lj+1)), ''''\ .i=\

For thefollowingexample:

50: REALA(20 , 10),B(20,10) ,C(20,10 ) 51: DO1"1 ,10

52: DOJ" 2 , 20

53: A(J ,O-B(J, I)

54: C(J,I)=A(J - 1 ,1)

55: ENODO

56: ENOOO

A(J, l ) willbemapp edto A'(20.1+J-20) ,andA(J-1,1)toA'(20. 1+J -21).

ThedependenceSJo{",,<)S~,obviously,must bepreserved.

2.3 Alias Analysis

1£two different variablesIIandbreferto thesame memoryloca ti on, theyarc called alia.'w.~ofone anot her.aandbare ezplicitaliasesifaprogramming languageconstru ct, such as Illiiollorequivalence,definesthemto(partly)overlap. By contrast,theyare illll/licd aliases iftheiraliasingiscaused via procedurepassingmechanisms.The setof allaliases ofx isdr,'dcdby alias(x ).Notethat if y is an alias ofx, yE alias(x) then nlsoxisaualias ofy,xEalias(y),

25

(40)

Explicit aliases can easi lybe recognizedby analyzing the declaration of n program, and hence arc notdiscussed here.The inter-procedu ralalias nnnlysisishriett~·tmro- duced to find implicitaliases.A more detailed presentationis givenin110,11, 3Bj.

To simplifythe discussion, it is assumed that a procedureIIlUSt.hcgiu with a111l1- cedureleader,which(in FORTRAN) consistsof the keywordSUBROUTINEfollowl'cl by the name of the procedure and a list of formal(ordummy)pnrnmetcrselldos(~d in parentheses. A procedure isinvoked by a CALL statementwhich couslsts oftill!

keyword CALL followed by the name of theprocedure andn listof nrguuientsclldOSl~d in parentheses.The arguments are also calledadulIl parameters.AylrJbulvariableisl\

nonlocalvariable which can bereferred to in a procedure bodywithout. !Hlssingitl~<;a parameterto the procedure.In FORTRANprograms, globalvariablesare thosewhich arc declared by COMMON statements.

For example,in the following program-:

PROGRAM MAIN COMMON G1. G2 , G3 51: CALLP1CG1.G1.G2)

END

SUBROUTINE Pl(Fl,F2 , F3}

COMMONG1,G2,G3 52: CALLP2( F1.F2.F3)

END

SUBROUT I NE P2 ( F4, F5, F6) COMMONG1,02,03 53: CALLP1(03. F4. F5) 54: CALL P3 ( F5 . F6)

END

SUBROUTINE P3( F7 , F8)

~ThisIsa.slightly modified example Crom 1381. 26

(41)

COMMONGl,G2,G3 F7"F8+2 END

there arc three procedures:PI withformalparametersFl,F2 andF3;P2 withformal parameters1"4,F5andF6;andP3with formalparametersF7andF8.Thesethree proceduresarcinvokedby CALLstatements81,82,Sa,and 54'Theactua l paramet ers nrcGl , G1,and02in 81! Fl,F2,and F3in 82,G3,F4,andFSin83,andF5andF6 in84•TIJCglobalvaria bles in thisprogramarcGl,G2andG3.Thisprogram will be used as an example throughont this sect ion.

Inter-procedural alias analysisfindstwo types of aliases.Type-laliasin gis caused bynsing aglobal varia bleasan actualparameterina CALL state ment. for example, ill Slltheglobal variable sG1and G2 arc used asactual parameters ofthe procedure PI. 111tJlillcase,Glis analias oftheformalparametersFlandF2of PI, and G2is Fa'snlias.So, Type-1aliasingisalsocalled globaH o-fonn al aliasing.

Typc- 2aliasingis causedby using thesame variableoralias variablesmore than onceas nctualparametersin a singleCALLstatement. Forexample,in51> 01is used t.wo timesI'IIltheactual parameterin theinvocationofPI, so the formalparame tersPI ami/0'2ofPInrc aliases ofeach other.Type-Saliasing is alsocalledfonn aHo-form al alia.1ing.

2.3.1 Detect ing Type-l Alias es

Thebindinggraphistheprimarydata structuretorepresent the relations between

~Iobalnnd formal variables andtocalculat e Type-Ialiases.

DejinilioIl138].A bindinggmphisa pair{3

=

(No.Eo),where:

1.NtJis theset offormal parametersof allproceduresin a program.

27

(42)

2. EDis a subset ofNoxNosuch that an edge(It,h)is illEtlif there nrc two proceduresPIandPI!suchthat

•J.isone of the formal parameters of111,

•hisone of the formal parameters of1"l,and

•J.gets boundtohduring an invocntion of1~ill 111'

Inthe example program,the statementS2.CALLP2(FI,F2,F3) ,bindsFI,1"2, andF3 toF4,F5,andF6; the statementSa,CALLPI(G3,1",1,1"5),bluds1",1 ami 1"5 to1"2and1"3;andthe statement54'CALLP3(F5,1"6),binds1"5 endFatoF7 1\11(1 F8.The binding graph for thisexample programis shownasfigure2.4.

Figure2.4:The binding graph(3for theexample program. Type-Ialiases arc determined in the followingthreestel>!l:

1.Construct the binding graph of the program.

2. For each nodefof the binding graph,initializeits aliasset,(IlilU(f) ,;~~follows: alias(J):={gI.9is bound tof}.

3. For each nodeIi of the binding graph,propagate th\;tJ~II.~(J )sets forward alolll-:

thedirected edges. That is, foreach edge (/;.Ii) : alias(h ):=alitu(fJ)Ualia.~(I;).

28

(43)

Table2.2:Typo-Ielteseefortheexampleprogram.

Fl F2 F3 F4 F5 F6 F7 F8

AfterSWIl 2,(llitt.~(Fl )

=

{Gl,G3}, alias (F2 )

=

{GI}, alills( F3)

=

{G2}and other

fllift .,S(~t.sarc empty. Instep3,these aliassetsarc propaga ted alongthebindinggraph.

Tlwrcsult ofType-Laliases lsshown illTable2.2.

Thebindinggraph isactually amulti-graphsincePI may containseveral invocat ions ofIJ.l,whichmayrcsnltinI.boundtohmorc than once. However,forhistorical reIlSOIlS, it isstill referredtoI\SII.graph. Moreover,altho ughst andardFORTRAN 77do(~snotallow recursive calls, manyotherlanguages,such as C andPASCAL ,do.

III t.hisCI\.<;(',it ispossiblethat a bindinggraph contains cycles, or stronglyconnect ed components(SCCs).Therefore,the propagation ofaliases inthe binding graphshould he morecomplicatedthanthatin the step 3 above.In fact,the main differencebetween [Illend[:ia}isI.ow topropagate the alia ....setsamong secsin step 3. In[11),first,each se c is mincedto n singlenode.Then, the alias sets arepropagat edalongreduced gmph. Finally, the reducedgraphisexpanded intothe originalthe binding graph.In [38],Tarjan'sdepth-first search algorithm[46]isused to findsecs and propagatethe f1litl.~sets along the bindi nggraphatthe sametime. This improvementsimplifiesthe algorithms.A detailed discussionisgivenin [U,3a,46].

29

(44)

2.3.2 Detecting Type-2 Aliases

T}'Pc-2aliasesarccausedbyusingn.IiasvariablesOfthesa llie vnriablemorethuntltl\"l ' asanactual parameter in asingleprocedure invoention.For example, inSl. CALI.

Pl(Gl,Gl,G2 ),the varia bleGlis usedtwiceas an actnnlparame ter forPI,litlFI andF2arealiases ofench oth er ;inS3,CALLPl(G3,1"·I,1"5),G3is aT~'PI'-luliusof F5,sotheformal parametersFlandF3of1'1arcalso nlinses.

To det ectType-2aliases,asetnhas beenproposed[381(culleda1II0rkii.•I).ElIl'h clement ofnis a triple(p,fl'h),ind icat ingthnt theformalpllrnlllt'tt~rsII nudhof a procedureparealiases of eachother.When allType-Ialiasesof n prourmn are known, Type-2aliasescan be determi nedin thefollowing twostl~P,~[381:

l. Cons truc ttileiniti al setnofthe program.

Foreachinvoca tionCALL 11(fll,.'.' n,,),andfornil netunlpamllll't,!'I 1l

11;and11" 1 :::;(Ii<(lj:$11,suchthatfI;=fl)orfI;lslUI1IlijL~ofIt },

thecorresponding formalparametersIiandIiIIfthe proeoluro11lIn~

added tonasRtriple(p,I;,I;).

2,Expand thesetn.

Foreachtriple(1',/1,12)inn,checkcad iInvocauon CALL ,,(ftl,..,fl,,) illtheprocedur eII,andifthere arc111:1.11111purametcrsfI;andIlj.1Si<

j :::;n,suchthatII=(Iiaut!h=(/j,thenfindt1wformal1'1Ir;1I111~tl~r:;

J :

andt;of the procedur eII and midthe triple(r,'

,I:,

f;)1.011.

Fortheexample program,stepIcreates the worklistH as shown inFil!;lIrt~:.UiTILI~

Type-daliasesofthe program,obtainedinstep 2,areshownillTable 2,:)'L:om biuilJl!;

30

(45)

Ta ble 2.3:Typc-2elinscefor theexampleprogr am.

F1 F2 F3 F1 F5 1"6 1"7 F8

Ta llie 2.2 withTahle2.3 creates the completealiases for eachformalparameter,as shownillTable2.4.

Figure 2.5:The complete worklis tnof the example program.

2.4 Dependence Graph

Bnthcont rol anddata dependenciesca nberep resented as a graph,called a dependence

!JHll!lL.The depondoncc graph G of1\programisa directed graphG

=

(N,El, where tht!set(Ifnodes,N.isthesetofthe assignments and branchconditionsof the selection andircrntiou constructsof theprogram, andthe directededges,E~NXN,represent both dntnand control dependencies.An edge(S;,5;) isinEifand only if 5,isdata.

dependentorcontrol-dependent011S;;ifSJis control-dependentonSj,(S;,B;)is labeled 1"(tl1l1') orF(f(ll.~e),to distinguish the mfromdata-dep endent edges. A dependence v;raphalsocontains theinitinl node ENTHY,which,as in controlflow graphs,represents a uniformbeginning oftheexecutionofthe program.

31

(46)

Table2..1:Complete[Type-Iand Typc-f} aliasesforthe exaurpleprogr nm.

formal param eterf F1 1"2 1'3 F,t 1"5 1"6 1"7 1'8

{GI,G3.f~,F3)

(GI, G3, Fl,P3) {GI,G2, G~,F1,J'~J

(Gl,G 3,l'" F G) (01,G 3, l' ,t,F G}

{GL,G2,G3,F-i,1"5) {GI,G3,l'S}

{Gl,G2,03,F7}

Forex ample, intheprogramshownin Figure 2.G(a),S:l,S:118.1nudS1IIrt' C'tllltrnl- dependenton 8\.Sr.and 80arc control-dep endent011S4'Sf,isdat,a-d ept!lltknl,nilSa due toP,51isda ta-dependentonSGduetoU,andS~ i~datu-dependentUIIS:ldiu'I,u y.The dependenc e graphGis shownin Fig ure2.6(h).

Sf:IF(A)THEN S2: Y",X+Z

ELSE SJ: P"'M-S

~. IF (B)THEN S5: V""P+S

ELSE

.\6: U=Y-Z

ENDIF S7: Q"'U

ENDIF Sa:T=Y

(,j (h)

Figure 2.6:Anexampleprogra mnndit.sdcpcudencegraph.

Note thatthisdependencegraph isdiffere ntfromtheIJmrJ"17nrlerll~7lIIe'IO~!}lTITIIla....

definedin[IS).In a.dependencegraph,the reare JlOnf/i onTI()IJr..~usedleislUlllImrbw thecont rolcond ition for a nodeandtogroup allnodeswhichhav(~theSIUl W.~(!tof

32

(47)

control cond itions. More information about programdependence graphscan befound in Jl.5,7J.

33

(48)

Chapter 3

Evaluation of Inherent P arallelism

Research in thearea ofexploiti ng the parallelism of prognuushnsbeen eoudueotlIVI' [ many years, andseveralexperimental compilingsystems exploiti ng pareltettsmin FOIl- TRAN programshave beendeveloped 12,3,7J.However, theseSYlilClllSusunlly exploit only loopparallelism. Eventhe recentcontribution s(32 1 cnuuot dcal wit.hpnmlld islII between different loopsor betweena loop end the otherparts of 11pnlJ.:rnlll.This chnp- telpresentsa new ap proach,whichcanbeusedtodt~alwithparallelismbetweenInol1~

andbetween loopsandother partsof11program toevaluatethe program'slnhorent parallelism.The presentedapproachdiffersquite significantlyfromthepn;viullswork.

The speedup factor is used to evaluatetheparallelism ofnprogram.1\~define dill Chapter1, the speedupfacto rofaprogramis

ilTJlXrlU TI=1 pa ralld~~"riol,

where1~<r;...listhetimeofth e scqncnt talexecutionof theprogr am,nrnl1~rolMistlll~

timeofthe maximal lypara llel executionofthe smueprogram. Ohvionsly,1~ffitd11m!

T""rall.larcfu nctionsoftheprogramaswellasitsinput datil.

GivenaprogramPand itsinputdataV,weCall[ecnceptnnlly ] execute the program Pwit hD.In thisexecution,for eachselectionconstru ctillP,we call determinethe

34

(49)

milleofthe booleanexpressionB(the branch co n dition) whichcanbeTRUEorFALSE.

Thisvalueiscalled thee%ecuting valueof the selecti onconstruct,andisdeno t ed byb.If the sch..ctlonco nstruc t is nestedinnloopswith indieesi1 ,••,i",l ::5ij:5Nj,i=l,.. .,n , there meN,x ... xN«executingvaluesforthesame booleanexpressionB;allthese values arcdenoted byh(i\,...,in } .The exec uting valuesof allselection constructsof the programP withdataDareused for calc u lating the valu esT"'illl andT~arlJJtd'

3.1 Eva l ua t io n of

T, crial

IfIIstatementSL~nest ed inn (n?:0)loops with indicesill ."in,S's sequentialcxe- cutten timefor tileiteration(ii ,...,in)isdenotedbyTmia/(S,(ih...,in)) .Itisassumed thatfor anassignmentstatementS(withnofunc t ioninvocations),the serialexecution timedo csuot dependupontheiteration ,soT..ro..z(S,(ih...,in))=T...io/(S ).theser ial cxccuu o u timeofthest at ementS,

Thescqucn t.ialexec u tion times or th esequenc e,selec t ionanditerationconstructs arcas follows

« ,

istheexecutiontimeofthebo olean expressionBintheselect ion constr u ct,and b(i1, ••,in)isthe executin g valueoftheselectionconstruct) :

...;cqucnce:

•.'1eleclion:

{

tb+ T..,ia.! (S Il(il,.· .•inlJ,

r ' ifb(i1>...,i,,)=Ti

T~"""'I (lfBthenS,elseS2 endlf,(ll1...,ln»= t T (5 (0 0)) b+",ia.! 2, 11,···, ln,

otherwise.

35

(50)

•iteration:

T~",riGl(f or

i:=1tofldoSenddc ,(i....,In))

=

ET,rr.-..I(S.(i"..,in,i)).

An im plement ationoftheevalua tionofT~rrjolis describedinthe next,dll\ptl'r.

3.2 Evaluation of

TraTalld

Theevalu ationof Tporllllel ismorecomplicatedthan tlmtof1:"', ;<01beenuscit.1Il1lNt ta keintoaccountbothcontroland datadep ende ncies muongthe statclllt'Ilt.SofIIj.\ivcm program.Inthepro posedapproach , the controland datadependencies arereproseutod as a dependencegraph,andT""rolldis evaluatedonthebasisofthisdependence graph.

For sim p licity,inanydependencegraphG,liltedge(Si,~j)isea11cdaf:ontn!/Ilf:- pendenceedgeif 8jis cont roldependenton S;.Ano de8, i!Jculll'{l1\11IFnodeif Si correspondsto abranchcon ditionina selectioncons truct,awl itiscalleduIcopuodc ifS; corresponds to a branchcond itionin an iterationconstruct.Furthermore, nlimit' is calleda control nodeif itisanIFnode, lo opuodc.or EN1'IWnode.Not.(~tlmlif anedge(S" 8J)isacontroldependenceedge,Sjmust. bencontrol1lQ(1(~.ApathImm EN TRYtoSiis calledacontrol path CP;ofH;ifitsalledges(ENTHY,s' I)'(b~,S-;), ... , (8'",,8i)arecontroldepe ndence edges.

Foreachstatem entHiofthe program,t;isusedto denotetheexecution timuclthis sta tement,andT;todenotethe tota lexecutiontimeofmax imallypnrnlld cxccutkm ofall statementsfromthe beginningofthe program toSi[includingSo).Itis1\."h'jIlUl,~d that bothtE NT RYanaTSNTRYarcO.

36

(51)

3.2.1 Evaluationof1i

Itcanhe cbservodthat for anygiven programandany st atement5j ,if5j isdata dependentor controldependenton anot he rstatement5;,then5jmustbeexecuted after8.(ifS,is going tobe executedatall).Inot h erwords,Tjshouldbe evaluated after tilt: evaluauionofT;.This is the basicprincipleof theproposedapproach .

Hastntemcnt Siisnestedinn(n>0)loopswith indices k"" .,kn,alogicalfunction Pi(kl ,...,kn )can bedefine d insuchawaythat F;( k l,.."kn )isTRUEif and onlyifthe sta tementS;is executedintheiteration(klo",kn ).i.e.,th econtrol pathCPoofSj in Gis'rRUE; otherwi seF;(k ..,,,,k,,.)isFALSE.

Therenrc twocasesto be considered in evaluating1;, CnseOne: S;is notin cl uded in anylo o p ,

ln gener alease, each nodeSiill thedepend encegra p hhasn (II?:0)data dependence edges(S;,j,S;j,j=1, ...,n,and one cont ro ldepen denceedge (Si' , S;),as sh ownin Figure3.1(a).

@ (i;)

~

<, r/F -

~ : 0

(.j (bJ

Figure3.1:Thedependence gra phfornode5;.

III thiscnse.

1;={t;+max{n' l, ,,.•1i,n,Ti') ' if F;; (3,1)

r", other wise.

For example,the program shown in the Figure3.2(11.)can beevaluatedusingthe formula(3 ,1),Let theexecutingvalue bofBin52 beTRUE.Then:

37

(52)

5,.' P:M-S 52: IF(B)THEN SJ: V=P+S

WE

s" U=Y-Z

ENDIF Sj .'

""U

Figu re 3.2:Theexa m ple pro gram I andiL~dopendenco HTnllh.

Tl=tl•

T2=12'

T3=1, +mum .T2)==h+mAX(tloh), T.=T2

=

t2,and

T"==1$+mu(T. )

=

t$+t2.

On tbe othe r band,ifthe executingvallle " of

n

in~isFAL.."l£:,then:

T.=Ir, T2=12' T,=T2=t2,

T.=14+max(T2)=I.+ta,nlld T,5 =f5 +mu(T.) =15+t.+t2-

CaseTwo:5;isinthe loopbody.

Forsimplicity,thelccp-camed -depeedcnccislimit edto Ilsingleloopwilhl\lJonuali7.CI'I indexwhidlvar-ies from 1toN witha unit incrcrnetlt be tweenitel"tioll.Oi.LetS.hl~

loop-cnrned-de p eedcnt ononlyoneSj inthisloop(IIsimilarllWroll c:!l(nilhecxtelld l!fl tomorecomplexloopsand morenodesSj011whichS.isloorl-carril~I.r1I~fwlHh'lIt;all implerncntntlcnof theextensionis dtoscri hed ill thenextch apter},!I.Hilllpl(Jt:XfllIlplrliN ltSrollows:

so: 00 1"1 .11

51: ACI)-B(I)

38

(53)

52: 8(n-A(I+ l)*C(I)

53: ENDDD

51isloop-e a rned-de pendenton82,8210812,withdistance(-1) due toA{l) and A{I+I),aIHI82is nlsoanti-dependenton81 becauseof B(l).This programcan be unfoldedIL~:

A(1)=8(1) B(1)o;:;A(2).C(1) ,4(2) -B(2) B(2):A(3).C(2) A(3)=B(3) B(3)=A(4).C(3) A(Jl)-BOO B(H)-A(H+1)*C(H)

millthe n itcan beobser ved tha.t,dueto theenti-depoadcncieein A(2),A(3 ),.."A(N) , the stu tcments cannotbeexeeu t odln parallel.

Ingeneralca se,each nodeSo hasn(u;::0)dependenceedges(S;,j,S;J,i=I".,u, onecontroldependence edge(S;"S;).and nneIo o p-cerr re d-depe n dcaceedge(8;,So) withthedirectio n distanceD=(d),1L'lsho wn inFigure3.1(bJ.To dealwith the loop- l'nrried-dc]ll'ndenceamongthe iterationsof theloop,T;isreplacedby Tj(k)wherekis the loopiteratio n index,k

=

1,...,11.Th en:

" {t,+"'''''IT;.,lkj, ,T,.•lk), T"lkj ,T, (k-ldlJ),

! f

F,lk)andk>Idl;

1,(k )= t,+'"""IT;.,lkj,. T,.•I>j,Til>)), IfF,(k)andk~Idl;

Ti,(k), otherwise.

(3.2) Fortheahoveprogram,F;(k),i

=

1,2,is alwaysTRUEbecause there is noselect ion ennsrrurt in this program.Then :

l'dl) =t,.

1;(1 )=t2+ml\X(T.( I))

=

tl+12 ,

39

(54)

TI(2)=tl tmax(T2(I)) =2t.+t2, T2(2)=t2tmax(TI(2)) =2(t.+t2) ,

rl(3)=tlt m ax(T2(2))=3t l+2t2, T2(3)=t2tm ax(T.(3»=3(tl+f2),

TI{N )=tltmax(T2(N-1))=Ntl+(N -1)1.20and T2{N)=t1tmax(T.(N ))'"N(1.ltt2)'

Thatis, theloop ca nnotbeexecuted in para lle l.

Anotherexample for thiscaseis as follows:

so: DO I-l,N 51: A(I) -S (I) 52: B(I)"'A(I-1).C(I) 53: ENOOO

82isloop-carried -dependenton51,

SIJS?,

with distance(-1) duetoA(I)ami A(I-I), and82is also anti-de penden tallSIbecauseof 13(1).Theil:

T.{l)=tl>

T2(1) =t2tmax(Tl(I»)=tl+t2, T.(2)=t"

T2(2)=t2tm ax(T1(2),1W ))=t2tll1ax(t"t l)=tlt2 T.(3)=t"

T2(3)

=

t2tmax(1H3),T1(2))=t2tmnx(tl,t.)

=

tltea, T.(N )= titand

T2(N)=t2+m nx(T;(N),1;(N-1))=t2tlllnx(tlt t il=t.tt'l' Althoug h51 and S2 cannotbeexecutedillparallel, the loop ecntainsIiOUIepar- all elism(whichcanbeSCCIlafterunfoldingtheloop). This example showsthat.tln- prop osed ap proach det ects parallelism existinginloopswithloop_f:llrried_t1f!II(:Jlcl(:lldl~ .

The corr ectness oftheproposed.approachcallhe verifiedby theexampleshownill Figure3.3and used in [32]toillust r atetheparallelismavailableillloops. [32J!IIJIJWS th a tthe lo o pcanbe executed in7Ntimeunitsassum ing that an addition isexecuted

'0

Références

Documents relatifs

The proof is “completely linear” (in the sense that it can not be generalized to a nonlinear equation) and the considered initial data are very confined/localized (in the sense

Choisissez la bonne réponse (utilisez ‘going to’):. 1.Sam ________ the books from

I was happily surprised to open the October 2011 issue to find articles on treatment of moderate pediatric asthma exacerbation, hypertensive emergencies in the

The following proposition corrects the part of the proof of Proposition 5.7 of the

This RFC suggests a change in the method of specifying the IP address to add new classes of networks to be called F, G, H, and K, to reduce the amount of wasted address space,

[r]

Pipe center :

[r]