• Aucun résultat trouvé

Par structing

N/A
N/A
Protected

Academic year: 2022

Partager "Par structing"

Copied!
129
0
0

Texte intégral

(1)
(2)
(3)
(4)
(5)

A Parall el Algorithm of Constructing A Voronoi Diagram on Hypercube

Connected Computer N etworks

By WenmaoChai

AthesissubmittedtotheSchool of GraduateStudie5 inpartia]fulfiUmenl oftherequirements {orthe degreeof

Mu ter of Science

Departmentof ComputerScience Ml"moriRlUniversityofNewfoundland St.John's Newfoundland CaDada

(6)

Contents

Acknowledgment

Ab stract

1 Int r odu ctio n

1.1 Parallel Computation Models 1.1.1 ParallelRandom AccessMachines.

viii

1.1.2 ProcessorNetworks ...

1.2 Literat ureReviewcl Parallcl Algorith msfor Construct ingVoronoi Diagr a m

2 Hypercube ConnectedComputerNetwo rkandItsPundn-

ment alOpe rations 18

2.1 SIMD Hyp ercube Connec tedComputerNetwork. 18 2.2 Fund a.mental OperationsonHyp e rcube ConnectedComp uter

Network, 22

2.2.1 Maximu m. 2'1

(7)

2.2.2 Ranking 25

2.2.3 Concentration . 29

2.2.4 Distr ibuti on. 32

2.2.5 Generali zation. 33

2.2.6 Merging and Unmerging 38

2.2.7 Sorting. 41

2.2.8 Random AccessRead. . 45

2.2.9 RandomAccesswrite 50

2.2.10 Precede 55

2.2.11 Summary

"

:\ APa rall elAlgorithmforCons truct ingVoronoi Diagr amson nHyperc ube Conn ect edCom p u t er Ne t wo rk

3.1 Definitionand Feature s of Vorcnc i Diagram 3.2 ConvexHulland InversionTransform.

3.2.1 Convex Hull.

60 61 63 63

3.2.2 InversionTransform 64

3.3 AParallelalgorithm to cons tructa Voronoi Diagra m 64 3.3.1 Parall elalgorith m forconstructing the 3-d convexhull 73 3.3.2 Aparallelalgorithmto test externnl faces andinte rnal

faces

3.3.3 A parall el algorit hm to construct asphericalVoronoi diagram

77

82

(8)

3.3.4 Aparallel algorithmtolocatepointson a spherical

Voronoi diagram 87

3.3.5 A parallelalgorithmto addnewfacesto the convex

polyhedron 92

3.4 Summary 4 Conclusio nand Discussion

4.1 Parallel Algorithm toConstructa VoronoiDiagram in1.1(I"....) on a hypercube connectedcomp uternetwork..

99

100 4.2 OptimalParallelAlgorithms to Constru ctVoronoiDiugruma.lOa

iii

(9)

List of Figur e s

1.1 Par allelRand omAccess Machine 1.2 Mesh wit h"=16 processo rs.

1.3 Cube-co n necte dcyclesnetwork withd=3 andn=24 10 1.4 A4-st ar

1.5 A-l-p an cake.•. .

2.1 Block dia gr amofanSIM Dcom p ute r 2.2 16 processo r hy per cube.

11 12

20 21

3.1 Voronoidiagram 62

3.2 Inversion.. 66

3.3 Relat ion between3-dconvex hullandz-dVoranoidiagram 70 3.4 The algorithmofconstr uct ingaVoranoiDiagram 72 3.5 The algorit hmof a threedimensionconvexhull merge. . 76

3.6 The two dimensional analogy 7'

3.7 Thealgori thm

or

externaland internal face test 83 3.8 The algorithmforconstru cti ng spherical Voronoi Diagra m 86

(10)

;'.9 Representation ofchain. all

3.10Thesearchalgorithmforpoints location 9:1

3.11Thealgorith mforaddingnew(aces ofconvexpolyhedron . 96

4.1 Voronoidiagram in1,,\met ric . 102

(11)

List of Tables

2.1 Exampl eto computemaxim um inallSIMDhypercube 26

2.2 Program for SIMD Maximum .. 27

2.3 Exampleto compute ranksinan SI MD hypercub e . 28 2.4 Progra m IcrSI MDranking procedu re ... 30 2.5 Exampleto co ncent rateinSIM D hypercu be 31 2.6 Program foe proce dureto concentraterecords 32 2.7 Exampleto distri bute inan SIM Dhyperc ube 33 2.8 Programfor procedu reto dist ributerecords 34 2.9 Exampleto gener a liseinanSI MDhypercube 35 2.10 Programfor pro cedureto generalizerecords 37

2.11Program for SIMD Reversing 39

2.12Bitonic sort into nondecreasi ngorder.. 40 2.13Iterativebit oni c sort forII,apower of 2 41 2.14Powerof2 bitonicsort [ncndec reaslngorder). 42 2.15 Powerof2bitonicsort (nonincreas ingorder). . . 43

2.16 ExampleofII.randomaccess read .. 46

(12)

2.17Algorithm (ora.random access read.. 48 2.18Example o(anarbihary randomacceJIwrite... 52 2.19Algorithm(oran arbitra ry rand omacee..write •. . .•. 54 3.1 TheAlgorit hmofConlt ructing3·DimeJllionalSpace Convex

Hull

vii

(13)

Ack nowledgme nt

Iwouldliketotakethis opportunity to expressmy sincerethanksto mysuperv isor,Dr.CaoanWang .He stimulated myint erest in thefield of computationalg~metry.Hi.constant encouragement Andguidanceduring thecourse of mystud yandresearch ledto the completionof this thesis.

Las t butnot theleast,1wouldliketo give mythenketo my husband, Zhcnpc n You ng,for hislove andhelp tbrou ghou tthecomposit ionofthis

viii

(14)

Abstract

Com putatio na lgeome tryis abra nch ofcomp uter scien ceCO UC C fl ll 'l!

with thedesignandanalysis ofalgorit hms to solvegeomet ric prubloms.TIll' Voronoi diagramor aset.5ofIIpoints (calledsites)is awellknownst mr-ture in comput ati onal geometry .Inthe Voronoi diagram ,eachpoint is surrounde d bya convex polygonenclosing tha t territorywhich isclosertothe surrounded point thanto anyot her pointintheset. Voronoi diag ra ms arcusefulill solvinggeomet ricproblem s suchas proximit y problem sand theEuclidean minimum spanning treeproblem. Voronoi dingmmsalsohavenpplicatjons in diverseareas likebiology,visua lperception,physics,andarcheology,

Thereexist manymethodsto const r uct Varonoidiagramsonn sill- gle computer.Twoof themareprop osedby Shamosand Brown .In 1075, Shamesapplied two-dime nsion al Vcronoi diag rams toobtain elegant110111- tlensin compu tationalgeometry , such asfinding theneare stncighlmrlUlll construct ion ofminimumspanni ngtrees. Shamesdescribedan0(,,101;11) timesequentialalgorithmto con structthe planarVoron oidiagramforasd ofplanarpoint s.The str at egyheusedin theserial algorithmisdi vide-and- conquer. In1979,Browndemons trated an interestinglinkagebetween till:

Voronoidiagramand the convexhull.He presentc dan0(11log" )algorithm for constructingtheVor onoidiagram,by transformingtheproblemofCOII- structinga planarVoron oidiagramforan »<pcintssettotheconstruction

ix

(15)

ofIIconvexhullof"pointsina-dimensionalspaceviaa geometrictransfer- million knownasinversio n.

Duetothe natureofsomeapplications in whichgeometricproblems arise,fasLandevenreal-time algorithmsareoft en required.Here,as in many ot her areas,parallelism seemstoholdthe greatest promiseformajor reductionincomputatio ntime.Theidea isto useseveralprocessor swhich cooperate to solveagiven problemsimult aneously inafraction of the time takenbyII.single processor.Therefore,itis not surprisingthattheinterest inparallel algorithm sforgeometri cproblems hasgrowninrecent yea rs.

BasedonShamos's and Brown'smeth ods,severalparallel algorithm s to construct Voronoi diagramshavebeenpresented.Someofthemareim- plement ed on parallelrandom accessmachines.Someof themarerunon processornetworkssuch asmesh,cube-connectedcycles, starsand pancakes.

Wewilldiscuss themin theliteraturereview.With thedevelopmentofcom- munlcntiontechnique an dconcurrent programming,compute r network shave become popular. Themostpopularprocessorinter connecti on topologytoday isundoubtedlythehyprl'Cllbc.Hypercube!have several advantages. First, thenumber of nodesina hypercubegrows exponenti allywith the number ofconnect ionspernode, sothat asmall increasein thehardwar e ateach nodeallowsa large increaseinthesizeof thecomputer.Second,the number of alternat ivepaths betweennodesincreases wit hthesizeof thehyp ercube,

(16)

whichhelpsrelievecongestion.Third,efficientalgorit hmsarc knownforrout- ing messagesbetweenprocessors in a hypercube.Finally,andtoday mod importantly,alar ge corp usofsoftwareand programm ingtechniquesexists forhypercube .

Thehypercubeis one of themostversati leand efficient networksyei discoveredfor parallelcomp utation.Inthisthesis,a single instru ctionIIm!- tiple datast reamhypercubeconnectedcomputernetworkis chosenas our par allelcompu tatio n model.In ahypercube connectedcom put er ncl work, local computationsas wellas messageexchanges arctakenintoconsiders- tio nwhenanalyzingthetime taken by theprocessornetworks tosolvefI.

problem.Based on NassimiandSah ni'spaper fund amental operationson hypercubeconnectcomputernetwor ksare descriptively discussed.TheCOt- respondingprograms are alsogiven. Based on Brown'sapproach,II.parallel algorit hmto constructVoronoi diagramsaredeveloped.Our algorithmrU/IS in O(log3n)time onan O(ll)-processor hypercube connectedcompute r net- work.Our algori thmhas severaladvantages.First, our algorithmisbased onBrown's methodwhichtransformstheprobl-em ofconstr ucti onof a planer aVoronoidia gra m forann-pointsettoconst ructio nofa convexhull orII

points in threedimensionalspace,Comparedwiththepar allel IIlgoriUllns whichare based on thedivide-and-conquer approachused by Shamos,our algorit hmcan be usedtosolvetwocomputationalgeomctryproblems:con- struct ing 2- dimen sional Voronoi diagrams and3- dimensiona lconvexhulls.

xi

(17)

Seco nd, compa ring with Chow's meth odswhichruns on aO(n)processors CCC (Cube-ConnectedCycles)model has O(log·n] timecomplexity,our nlgorithm has better time complexity.Third,comparingwithChang-Su ng Jeon g's algorit hm which runs inO(

J'Ti)

onan.jilx

..;n

mesh,our parallel

com puta tion model ismoregeneral becausemost other popular networks can be easilymappedontoa.hypercub e.

xii

(18)

Chapter 1 Introduction

Programmin g computerstoprocess pictorialdat a efficientlyhas been an activityof growingimportanceoverthelast40 year s.Thesepictoeia.l datR maycome frommanysources.We dist inguishtwogen eralclasses:

1.Most often, thedat aare inherentlypictorial, suchasthe images aris- ingin medical, scientific, andindustrialapplications.WeAthermaps receivedfromsatellitesin out erspace are agoodexample.

2. Alternatively,the data areobtainedwhena mathematica lmodelisIISl'(1 to solve a problemandthemod elrelieson pict orial da ta.Exarnpks hereinclude computingtheaverageof a setofdata(representedas pointsinspace) in the presence ofoutliers,computingthevalueof a function thatsatisfies a set ofconstrai nts,andso on.

(19)

Regardle ss oftheir sources,thesecomputationsmay includethe ope r- ationsof iden t ifyingcontours ofobjects,"noise" removal,featureenhance- ment, patternrecogniti on, dete ctionof hiddenlines, andobtaini n ginterse c- tiona among various comp onents.At the foundat ion ofall theseeornp utation a areproblemsof a geomet ricnature,tha t is,problemsinvol vingpoints,line s, polygons, andcircles. Cmnplttalionalgeometryisthebranchof computer scienceconcern ed with designingefficientalgorithm sfor solvinggeomet r ic proble msof in clusion,intersecti on,and proximity , to namebut afew.

Untilrecen tly,theseproblemswere solved using conventiona lscqIIC/lti(l{

comp u ters,computers whosedesignmore orlessfollowsthe model proposed by JohnvonNeumannandhis tea m in thelate19405.Themodel consistsof nsin gle processor capa bleof executing exactly oneinstru ction of aprogra m durin geach timeunit . Comp utersbuiltaccordi ngtothis para digm have beenableto perform attreme n dous speeds.However,itseemstodaythat this approachhas been pushedas far asitwillgo,and thatthesimple laws of physicswill standin thewayof furtherprogress. For example, the spee d of light imposes alimit thatcannot be surpassedby any electronicdevice.

Onthe otherhand, our appetitegrows con tinually fot ever morepow- erful compute rseapeb leofprocessinglarge amountsof data atgreat speeds.

Onesolution to this predicamentthathasgained credibilityan d popularity isIlflmUr:!pIYJrr...illg.Hereacomputa ti onal problemtobe solvedis broke n

(20)

into smal lerpartsthat are solvedsimulta n eously bytheseveralproc essorsof apamllefcompltlcr.The ideaisa.natura lone,an dthe decr easing cost and sizeof electronic compo ne nts havemadeit feasible.Latel y, compu t e rscien- tistshav e been busy build ingpa rallelcomputers anddevel opingalgorithms andsoftware to solveproblemsonthem.One Il.rea that hasreceived its rair share of interest is thedevelopment of Ilnn/llr1f//!lnl'il/lIl1.~for ccmp utatioual geomet r y.

TheVoronoidiagr am isa mathematical concepl attributedto math- ematician Voron oi[I). Voronoidiagram shavebeen well studiedinCOIll -

putation al geom etrysin cethework of Shernce(2Jpartlybecauseortheir applicati onsinsolvinggeometri c problem s such asproximityproblems and the Euc lidean Minimum spannin gtreepro blem, aswell astheir ap p lications insuchdiverse areasasbiology,visual perception,physic s , and archeology [3J.

Thereexis t manymethod sto con st ruct Vor inoidiagram on asingle c':....puter . Twoof themareproposedbyShamosand Brown. III 1975, Shamosapplied two-dimen sional Voronoidiagramsto ob t ainele g antsolu- tionsincompu t ational geometry, suchas findingthenearestneighborand constr u ct ion of minimumepen rringtrees,Shemo edescribed all. 0(11 logu) time seq uential algorithm toconstr ucttheplana rVorono i diagramfornset of planar points.The strategy he used in theseri a lalgori t hm isdivide-end -

(21)

conquer. In 1979, Browndemonstr atedaninterestinglinkagebetwe enthe Vorona jdiagram and theconvexhull.Hepresented an O(nJogll)algo rithm foroonst ruc!ingtheVoronoidiagram,bytransformingtheproblemofcon- structing aplan arVoronoi diagra mforana-pointssettothe construction ofa convexhull of1/points in3-dimen!ional8paceviaa.geometrictran!fo··...

malionknown as inversion .

Inthis thesis,thehypercube connectedcom puter net work is chosen astheparallelcomputa t io nmodel. BasedonBrown'sap proach[41which transfoems theproblem ofconstr uct ionof aplanaraVoron oidiagram foran u-pointsotto construct ionofaconvexhullofIIpoints inth reedimensional space,aparallel algorithmto const ruct Voronci dia gramswillbedeveloped.

Inthis chapte r,after we introduceparallelcomp u tatio nmo dels,wewillre- viewpar allel algorithms to const ructVoron oi diagra ms. Finally,the outline ofthethesiswillbegiven.

1.1 Parallel Computation Models

Inorder toreview theparallelalgorith msto constructVoronoi diagram , we firstreview some exis t ing models of par allelco mputationinthissection.

(22)

1.1.1 ParallelRandom Acc e s sMachines

In parallelrandomaccessma chine (PRA M),a common mem ory isused as abulletinboa rdand alldataexchangesareexecuted through it.Any pair ofprocessorscan communicate thro u ghthisShH ~dmemoryinconstant time.As showninFig.1.1, an interconnection unit(lU) allowseach process or toestabli ~hapa t h to eachme m o ry locationfor thepur poseof rend ingOf writin g ,

The processors operatesynchronouslyand eachstepor a compu ta tio n consistsof threephllSes :

1.The1"Cndphase,inwhich the processors readdatafrom me m ory;

2.Thecom p utephas e,inwhich arithmetic andlogir; opeeafiorrsareper- fo r med;

3.Thewrit ephase, inwhichtheprocessors write dat a tomem o r y.

Dependi ng onwhethertwo ormote processor s ate allowedto rea dIrom and/orwriteto thesamememory10cII.tionsimultaneous ly,three submod- els of the PRA M areident ified ;

1.Theexclusive-rea dexclus i ve-write(EREW) PRAM,where bothreud and writeaccesses bymore thanone pro cessortothe samememo ry lo cation arenotallowed.

(23)

Processo rs

Inter connection Unit (IU)

Shared memory Locat ions

Figure 1.1: Par allelRando m Access Machine

(24)

2. Theconcurrent-readexclusive-write (CREW)PRAM, wheresimultn- neous readin g from thesamememor y locat ionisallo wed, but not si- multa neouswriting.

3.The concu rr ent-readconcurrent-write(CRO W) PRAM,whe reboth forms of sim ultaneo usaccess are allo wed.

1.1.2 Proces sorNetworks

Ina processornetwork,an int erconnecte d setofprocessors cooperate to solveaproblemby performing localcom p ut ations andexchang ingmessages . Severalofthe most widely usednet work sareout line dbelow,

1.Mesh orTwo-Dim ensiona l Ar r a y

Atwo-dim ensionalnetworkis obtainedby arrangingtheII peoces-

aorsintoanTIl XTna.rray,whereT1I

=..;n,

The processor inrow

iandcolu m nkis denoted by(j,k),whe re0:5j :51/1- I an d

o

:5" k:5"m -1. Atwo-waycom m unicat ion linelinks(j,k)toits

neighbors(j

+

1,k),U-I,k),(j,~:

+

1),andU,k- 1).Processorson the bound a ry rowsandcolumns have fewerthanfour neighborsand hen ce fewerconnectio ns.Suchanet work isalsoknownasthe,111,.../t orthemesh-connected commuer(MeC)mode l.Fig.1.2showsamesh withn=16processors.

(25)

no.

Number

o

Column Number

Figure1.2:Mesh withn

=

16proeesson

(26)

2. Cube-ConnectedCycles

Toob t ai n a cube-connected cycles(CeO)network,we begin witha d.

dimensional hy p ercube, then replaceeach ofits 2" corners withacycle ofdprocesso rs.Each processorina cycleis connected to a processorin aneighboringcycle in thc eeme dimension.See Fig.1.3forancxample of acecnetwork withd:;;;3andII= 2'1.//=24processors.Inthe figure,each processor hastwo indicesi,i ,whereiis theprocessororder in cyclei.

3.StarsandPancakes

Star andpancake are two interconnectio nnetworkswith the property that fora given integer 11. eachprocessor correspo ndsto adistinct per- mutation of '1symbols,say{I , 2,...,l}}.In otherwords,both networks connectII='1!processors, and each processorislabeled withtheper- mutationtowhi ch it corresponds.Thus,for '1

=

4,a.processor may have thelabel2134. Inthe.~tarnell/Jo r k.denotedby S",a processor visconnected to a processor uifandonly if the label ofItcanbe obtainedfrom that ofvby exchangingthefirst symbolwith theiU' symbol,where2$i$1/.Thus forI}=4.if v=2134 and11

=

3121, 11and v areconnectedby a two-way lin k inS,,,since312~and 21M can be obtained from one another by exchangi ngthe first and third symbols .Fig .1.4shows Sf.

(27)

14 16

10 12

30 20

15

32 22 17

31 21 23 33

Figure1.3:Cube-connectedcyclesnetwor k withd=3andn=24

10

(28)

3214

2314

4312

1342

b/

1234

3142

Figure 1.4:A4·st8.r

11

2143 243(

3421

1423

4123

(29)

1231 4321

32]4

23]4

4132

1432

2341

3241

1423

4123

3412

Figure1.5:A 4-p!l.lIcake 2143

III theIItIll"'lhIIt/wor/':.denotedby1-'",a processorvisconnecte d toa.processor1/ifandonlyifthelabelof11canbeobtainedfromthat oft'byflippi ng the nrstisymbols,where2$iS 'I. Thus for'I

=

4, ift'

=

2134andu=4312,n andvareconnect ed by a two-waylinkin lIt.since4312 canbeobtained from2134byflippin g the foursymbols, and viceversa.Fig. 1.5 showsPl'

12

(30)

1.2 Lit er ature R eview of P a rallel Algori thms for Co n str ucti ng Voronoi Di agr-a m

In1980,Chow inherPh.Dthe~i8151propo sedthreeparallelalgorithms (orconst ructing Voronoidiagrams.Heralgorithmswere basedallDrown's method \41.Byusingtheinversiontechnique , she transformedthe constr uc- tionof the 2-IiVoranoi diagram to theconst ructionofthe3-,Jconvex hull.The computationmodel e she usedwereCREW PRAM and

cee

per- allclcomputernetwork s.On the CREW PRAM model,her algorithmrunll in O(log3 n)time with0(,,)processors. On the

cce

model,one of heralgo- rithmsruns in O(log"'71)timeand uses0(11)processors,and theot her runs in0(k10g3,, )tim e and usesO(n1+1 / k)processors,where,1::;k$logII.

In1986,Mi Lu161presented a parallelalgorithmto constr ucttile Voronoi diagramofaset ofplana r points.The algorithmisbased on Brown's approach{4Jand has O(.jii'log ll)timeeomplexltyonO( .jIix

v'ii)

MCC with constantstorageperprocessors.

In 1988,Preilowskiand Mumbeck{VIsuggesteda time-optimal pamllcl algorit hmto computeVoronoidiagrams.Thealgorithmrunsin O(log,') timeusing0(11-3 )processorsonaCREW PRAM.Thisresultis time-optimal becausethesorting problemcan be reducedto the Voronoi diagramproblem.

The authors showedbow their algorithm can compute all edgesof the Vcronoi

13

(31)

diagramin0(1)timewith0(114)processors.

In 1988,Leveepeul oe,Kat ll.jainenand Lingaa[8} gavea.par allelalgo- rit hmforcomputi ngthe Voronoidiagramofaplanar pointsetwithin8.square windowW.Thealgorithmusesmultilevel bucketing and runsinO(logII) averagetimeonCReWPRAMwithO(I,/logII)processorswhen thepoints aredrawn independentlyfroma uniform distribution.

In1989, Evans andStojmenvoic [9Jpresented an 0(10g3,,)algorithm for constructingtheVoronoi diagram of aset ofnpoints onCREWPRAM.

OntheCReWPRAM modelthealgorithmrunsin0(log2n )time.The algorit hm uses Shamos's divide-and-conqu ermethod[21.

In1990, Chang.Sung Jeong[10/ gave a parallel timeoptimal algorithm whichrunsinO( y'ii) onan

..fii

x

..fii

mesh. Thealgorithmisbased on thedivide-and-conquer approachused byShamos(21.The setofpointsis sort edbyor-coordinate and divided inhalfintotwo setsLandR bya vertical separatingline1 suchthat points inI~areto theleft ofIandpointsinR arc totherightofI.Recur sively,theVoronoidiagra.mVo/·(I..)andVor(R ) arc compute dforthesetsI..and R,respectively. Thetwodiagrams are thenmerged ,resultinginV01'(I..UIl).II01'(I..)istheVoronoidiagram ofI..,

1"111'(/1)istheVoronoidiagramofRandVor(LUR) is theVoronoidiagram

ofl..u/I.

14

(32)

In1992, S.GAkl[11\presentedparallelalgorith msthat computethe VoronoiDiagram ofI'points onthestarandpancakenetworkcomputers.

For anu-starorn-pancake withII

=

,,1 processors,given/Iplanar points sto redin the processorssuch that eac hprocessorhl)ldsonepoint.andlias a memoryofconstantsize, the Voronoidiagramof these point s canbeIound inO( n~log2 ,,)time.

In thisChapte r,we review parallelcom putat ionmodelsandparallel algorithmsof constructingVcronoi diagrams.Themost popularprocessor inte rconnectio ntopology today isundoub tedlythehypercub e.Thehyper- cube is oneof the most versatileandefficientnetworks thusfnrdiscovered for parallelcomputation112Jbecause:

•In a hypercub e,using(!connecti ons perprocessor,2"processormay beint erconnect ed such that themaximumdist an cebetweenanytwo processorsis d.While lineararray,tree,Mesh and Mesh of treeuse a smaller number ofconnectionsperprocessor,themaximum distance betweenprocessors islarger.

• Most otherpopular networksare easily mapped intoa hyp ercube.In part icular,thea-node hypercu becansimulateany O(II)-node array, tree,ormeshoftreeswith onlya smallconstantfacto rslowdown\121.

•Hypercube has the advantage of beinga wellstudied networ k.Efficient algorit hmsareknownforrout ing messagesbetween processorsinII.

15

(33)

hypercube.A large corpus ofsoftwareand programming techniques exist for hypercubes [13].

•Ahypercubeis completely symmetric. Everyprocessor'sinterconnec- tionpattern is like thatofevery otherprocessor.Further more,ahyp er- cube is completely decomposableinto sub-hypercubes[i.e., hypercubes of smallerdimension).This property makes itrelatively easy toimple- mentrecur sivedivide-and-conqueralgorithmsonthehypercube (14].

In ahypercubeconnectedcomputer networks, localcomput ationsas well asmessageexchangesaretaken into considerationwhen analyzing thetime taken bytheprocessor networksto solveaproblem.Whendesigningan algorithmfor aprocessor network, therouti ng of messages fromonepro- cesser toanotheris the responsibilityofthealgorithmdesigner.In Chap- ter2, we will introducethe parallelcomputationmodel usedinthe thesis.

Based onNa.ssimiand Sahni'spaper[15), severalfundamentaloperat ions onhy percube connectcomputer networksaredescribed.The corresponding progra ms arealso given. All thoseoperation swillbeused in chapter3.In Chapte r3, Based onBrown's metho d,wewill developa parallelalgorit hm to constructVoronoi diagrams. Ouralgorit hm runs in0(log3»)timeon an O(II).processorhypercubeconnectedcomputernetwork.Ouralgorithmis buedonBrown'smethodwhich transformsthe problemof const ruct ionof a planar aVoronoi diagram for ann-pci ntsetto constructionof a convexhull ofIIpointsinthreedimensionalspace.Comparingwiththeparallel elgc-

16

(34)

rithmswhicharebasedonthe divide-end-conquerapproachused byShames, ouralgorithmcanbe usedto solve two computationalgeometryproblems:

constr ucting2-dimensionalVoronoi diagram and3- dimensio nalconvex hull. Compari ngwit hChow's metho ds(5)which runs on(\.0(11 )processors CCC (Cube-Con nectedCycles) model has 0(log·11I)time complexity,our algorit hmhasless timecomplexity.ComparingwithChang- Su ng Jeong's algori thm[l O]whichruns inO(

vn)

on an

vn

x

.fii

mesh,ourparallelCOIl1-

put at ion model ismoregeneral.Most otherpopularnetworkscanbeeasily mappedontoa hypercube{12].

17

(35)

Chapter 2

Hypercube Connected Computer Network and Its Fundamental Operations

Inthis chapte r,theSIMD hyper cubeconnectedcomputernetwor k will be defined.Somefundamental operationsonit will bedescribedandccrre-

spo nding programs.ilIIll">begiven.

2.1 SI MD Hyp erc ub e Co nnected Computer Network

AccordingtoMichaelJ.FInn'. (161taxonomy ofcompu te rarc hitect ure, parallelcomputersaredividedintotwo cetegoriea, single instructionmul- tipl,. datastreams (SIMD)and multipleinstruct ionmultiple datastr eams

18

(36)

(MIMD).Ablock diagram IorII.SIMD compute r is givenill Figure 2.1.

AsCl'IlIbeseen,an SIMDcomputerconsistsofu procesaingelemenh(PE's}.

ThePE's areindexed0 through"-1and maybereferenced as1>/o:(i).Eaeh PEhas i'.slocalmemory.The PE'sare synchronizedandoperateunder the: control program.PE'smay be enabled or disabledsothat thecommon instr uctionfor any giventime-unitis executedonlyonenabled PE's.This enabling anddisablingof PE's can be donewit hout theuse ofseparatecon- trollines foreachPE as long aseachPEknows its own index

1 171.

The PE's areconnectedtogetherviaan int erconnection network.Differentin- terccnnectionnetworks lead todifferent SIMD architectures.In this thesis, hypercubeconnect SIMD compute rsareconsidered.

Assume that n

=

2"andlet;J_I, ••io bethebinaryrepresentation of iforiE[0, 11-1].Leti(~lbe the numberwhosebinary representat ion is i"_1...iH1r;;ib_1...io, where;;; is the complementofiband 0:5 "<,I.That is,i(~Jis obta ined by complementingtheb'th bit ofi's binary representat ion.

Inthe hypercube model processoriis connectedto processors;lbJ,0:5" "< ,I.

Fig. 2.2showsan exampleofIf=16processorhypercube.

The hypercube is an excellent(andpopular)choiceforthearchitect ure ofamultipurpose par allelmachine. In this thesis,we choose theSIMD hypercube connectedcomputer networks as our parallel comput at ion model.

19

(37)

I/O

I/O

I

Inter- connect ion Network

Figure2.1:Block diagram ofan SIMDcompute r

20

(38)

1100

1(100

.> /

1110,:

"

-, ,.y o .:---

0111

010 011

000 _0011

o~ ~

--

001

..

~~~

- --

V .:

1010

Figure2.2: 16processorhypercube

21

1111

ion

(39)

2.2 Fundament al O p e r a tions on Hypercube Co n ne cted Computer Network

Inthis section ,based on Nassimi and Sahni'.pap er ll S), several Iunda.

mente!operationsonhype rcubeconnectedcomputernetworksarcdescri p- tivelydiscu ssed. Thederivationsandthe correspondingprogramsarcalso givenafter thedescriptionofeachopera tion. Allthoseoperations willbe usedin chapter3.Therefore,thissectioncanbeconsideredas theprepa ra- tion(orehapter3.Only operations whichareusedwillhediscussed. Based on this,somevery buicoper ations, suchasdat a accumulation and eonseeu-

livesumonhyp ercu be connectedcomputernet work.willnotbementioned in thissection.Formoredetails, [18] wouldbe a goodreference.

Beforethefund&mental operat ionsonhypercu be connectedcomput er networks are introduced, some program mingnot ationusedi.givena.Icllcwsr

I.Thenotati on;1.'i.torepresentthe numberthat differ.fromiin enctly bitfl.Thesquarebracket.(1))areused toindex anarray and the parentheses('()')are usedto index PEs.ThusA[llrefersto thei'th elementofarrayAandA(i)referstotheA registerofPEi,A(j](i) referstothe j'thelement ofarray A in FEi,Thelocal memoryineach PEholds dataonly[f.c.ItOexecuta ble instructions).PEsneedto be

22

(40)

able to perform onlythe basic arithmeticoperatio ns.

2.Thereareaseparate program memory andII;cont rol unit.TheCOil -

trolunit performs instruction sequencing, fetching,anddecoding.III addition,instructionsand masksare broadcastedby thecontrolunit tothe PEsCorexecution. AninstructionIII(I...isIIbooleanfunction usedto selectcertain PEs to executeaninstruction.Forexample,in theinstruction

A(i): =A(i) +1,(i o =1)

(io

=

1)isa maskthat selectsonly those PEs whoseindexhasIllt 0 equal to 1.

3.Interp rocessor assignments are denotedusingthesymbol" _ ", while intre processcrassignments are dcaoted using thesymbol":=".Thus theassignmentstatement:

on a hypercubeisexecutedonlyby the processorswithbit2 equal to O.

TheseprocessorstransmittheirIiregister datatothecorresponding processorswith bit 2equal to1.

4.A d-dimensional hypercube can be partitionedinto windowsofsi?.c 2kprocessorseach.Assume thatthisis donein sucha waythatthe processorindices ineach windowdifferonlyintheir least significantk

23

(41)

bits. As a result,the processorson each windowforma subhypercube of dimensionk.

2.2. 1 Maximum

Assumethata dimensiondhypercubeis partitionedinto subhypercubes (orwindows) of dimensionk,whereP

=

2dandW

=

2kIff==iW+q (0:5(/::;W)isa processorindex ,thenprocessorfis theq'thprocessorin windowi,This processoris to compute themaxim um elementofallelements, where, O::;i <P/W , O$ q <W.

Themaximumrelative tothe wholesize~vwindow areobtainedas below:

1.Ifa proc essoris in theleft2 k-1snbwindow,then its maximum isun- changed.

2.The maximumofa processorin the rightsubwindowis itsmaximum whencon sidered as a memberofis.2k-1window compared to themax- imum oftheAvaluesin the left subwindow.

Table2.1 givesanexample-maximumcomp utation.The numberof processorsandthe windowsize IV

=

2kareboth 8. Line0 givesthe initial Itvalues.Themaximumin thecurrent windowsare storedin theSregisters and the maximumof theJI valuesof theprocessorsinthe currentwindows

24

(42)

arestored intheT registers. Webegin withwindowsoCsize1.The initial SandTvalu es are given in lines1and2,respectively. Next,theSandT valuesfor windowsofsize:2 are obt ained. These aregiven in lines 3 and.1.

Line 5 and6givetheSandT valueswhen the window size is>1andlines 9 and 10givethese valuesforthe casewhenthe wind ow sizei~8. The program in table2.2is theresultingprocedure. Its timecom plexityisO{k}.

2.2 .2 Ranking

Associatedwith processor,i,ineachsize2kwindowof nhypercube is

aflagsel~r;led(i}which istrue iffthisis a selected processor.The objective

of rankingistoassigntoeach selected processoraI'illlksuch thatnwk(i)is the number ofselectedprocessors ofthe window withindexless thani.Line oof Table2.3shows theselectedprcceesora in awindowofsize eightwillI an

*.

The ranksto be computedareehcwnin line1.

Theranks ofthe selectedprocessorsinawindow ofsize2kcnnhe computed easilyifwe knowthefollowing informationCor theprocessors in each ofthesize2k-1subwindcwsthatcomprisethesize2kwindow:

1.Rankofeachselecte dprocessorinthe2k-Lsubwindows 2.Tot alnum berof selectedprocessors in each2.1:-1subwin dow

Ifaprocessorisin the left2.1:-1subwindow then its rank inthe2l 25

(43)

PE I 2 3 4 5 6 7 line 0

0 2 4 3 1 5 2 8 I A

2 4 3 5 2 8 I S

2 4 3 5 2 8 1 T

2 I 3 3 5 5 8 8 S

4 4 4 3 3 5 5 8 8 T

5 2 4 4 4 5 5 8 8 S

8 4 44 4 8 8 8 8 T

7 2 4 4 4 5 5 8 8 S

8 8 8 8 8 8 8 8 8 T

Ta ble 2.1:Exampletocompute maximuminan SIMDhypercube

26

(44)

pr oc edu r eS/MIJMI/J'illlll/ll(;\,~·,SO);

{ComputetheMaximum ofrIin windowsofsize2k}

begin

{Initia lize forsize 1windows}

S(i ) :=A(i);'I'(i) :=A(il;

{compute {or size2~+Iwindows } forb:=0to~'-1 do begi n

l3(i (~))__'r(i);

S(i ):=H(i),(S( i)<l1(i)and i,.

=

1);

,.(;), ~HUl,('I'U)<H(i)) ; end;

en d; {ofS/ !II/J AlII.rimum }

Table2.2:ProgramforSlMD Maximum

windowis thesameasitsrank insubwindow. Ifitis in the rightsubwindow, itsrank is its rank inthesubwindowplus thenumber ofselecte dprocessors in theleft subwindow. Lin e 2 of Table 2.3shows therank ofeachselected processorrelative tosubwindows ofsize4.Line3showsthetot alnumber or selected processorsin eachsubwindow.

LetH(i) andSU),re spectively, denote therankof processori(if it is a.selected processor) andthenumber ofselected processorsin the current windowthatcon tai ns processori,The strategyto count rank s inwindows or size2kistobeginwithU.and SOfor windowsofsizeoneand then repeat edly double thewindowsizeuntil reachinga.windowsizeof2k•Forwindowsor sizeone,itis given:

27

(45)

,,!,E 0 I 2 3 4 5 6 7

line

0 1 2 00 3 4 H

2 00 0 1 0 00 1 2 R

2 2 2 2 3 3 3 3 S

0 0 0 0 0 0 0 0 R

0 I 1 0 1 0 I 1 S

0 0 0 0 0 0 0 1 R

1 I 1 1 I 2 2 S

0 0 1 1 0 0 1 2 R

2 2 2 2 3 3 3 3 S

10 0 0 1 2

,

3 4 R

1] 5

s

5 5 5 5 5 5 S

Table2.3:Exampleto computeranks in an SIMD hy percube

28

(46)

lI(i) S(i)

_ 0

101 ifiisselected

1

otherwise

Lines4and 5of Table2.3give theinitialIiandSvalues. Lines6 and 1givethe valuesforwindowsofsize2. Lines8and 9 givethesefor windowsofsize 4,and lines10 and 11givethem for awidowsizeof 8.The ranksfor theprocessorsthat arcnotselectedmaynowbe setto00to gelthe configurationof line1.The pro cedureto compute ranks isgiven inTahle2.'1.

Thisproced ure is dueto Nass imiand Sahni[lsIandits complexi t yisreadily seentobe0(.':).

2.2.3 Conce ntratio n

Ina data conc entra tionoperation,itbegins with onerecord,

r:,

ineach

oftheprocessorsselectedforthisoperation.The selectedprocessorshave been rankedandthe rank infor mation isin a fieldIioftherecord.ASfoumc thewindow sizeis2kTheobjective i$to move therankedrecords inench window to theprocessorwhosepositi on in thewindowequa lstherecord rank.Line0of Table2.5gives aninitial configurationfor an SIMD eil;ht processorwindow. Therecordsare shownaspairs withthesecond entry ineach pair being therank. Itis assumedthatthepro cessorstllalarenot

29

(47)

pro c e du reIlwJ.{k)j

{Com putethe rank ofselected processorsin windowso( size2k} {SIM D hypercube}

begin

{Initializefor .i7.e 1windo ws) lI(il

,=

0,

iflIrlrclcd(i) thenS(i):=1 else5(i):=0;

{Comput e(orsizezklwindows) fo r":=0to~- 1 do begin

1'(i(·I)-S(i);

lI(i),=lI(i)+7'(i) ,(i. =1);

S(i)

,=

5(i )+7'(i);

end:

Il(i):=00,(not...tlrrlcd(i)) ; cud; {of ml,k}

Table2.4:Progr a.m forSIMDrankingprocedure

30

(48)

~

line

0 (-,00) (B,O) (-,00) (D,l) (E, ' ) (-,00) (G,3) (11,1) (B, O)(D, l ) (E, 2) (G,3) (H,4) (-,00) (-,00) (- , ~I (B,O) (-, «) (-,00) (D,I) (E,2) (-,00) (H,4 ) (G,3) 3 (B, O) (D,l) (-,«) (-,=) (H,4) (-, 00) (E, 2) (G,3)

Table 2.5:Example toconcentrate in SIMD hypercube

selectedfortheconcentration operationhave a tank of00.The resultof the concentrationisshownin line1. Let D,D,E,11 berepresentedindividal records.

Data concentra tioncanbedonein0(1.: )timeby obtainingthe agree- mentbetween thebitsof thedestinationofII.recordanditspresent locution intheorder 0,1,2, "', k - 1[151.Forexa mple, let us seck agreement on bito.Examiningthe initialconfiguration(line0),itcanbe seen thatthe destinationand presentlocationofrecordsB,G andHdisagr ee onbito.

To obtainagreement ,theserecordswiththe recordsin neighborprocessors alonghit 0 areexchanged. Thisgives theconfigurati onof line 2.Examining thebit1of destinationand presentlocatio n in line 2,it can alsobeseen thatrecordsD, E andHhavea disagreement. Exchangingthese records

31

(49)

proced u rer.rHlet:'draleCO,k);

{Concentra.te recordsG'in-..leet edprocessors.2·itthewindOW'size ) {H isthe renkfidd ofareerd]

begin

forb:::::0tok -ldo begin

F{,1i')_C:(i) ;

G(i)~'"(i),((G(i).R#00an d(G(i).R),I

i.»

or(I-'(i ).11#-00an d(F(i).Hh#-i&)) );

en d; cud: {ofNJIIt1':/I/""/r.}

Table2.6:Programfor procedur eto concentraterecords withtheirneighb orsalong bit 1 yieldsline3.Finally,letusexaminebit 2 of thedcsfi nation and presentIeeationoCre cordsin line3an ddeterminethat recordsEandG need tobeexcha nged with theirneighborsalongbit2.This rel tlltsinthe desired finalconfigu rationofline 1.

2.2.4 Distribu t ion

Da ta distri butionis theinve rseoCdataconcentration,Itbegi ns with records inprocessors0,••., rofiI.hypercub e wind owofsize2t.Eachrecord hasa destinatio nO(i)associatedwith it.Thedestinationsin ea ch window

UClUththat0(0)<D(I)..•<D{r ).The recordthatisinitiallyin proces- soriofthe windowiatoberout ed to theO(i)'thprocessoroC th e window.

Note thatr may vary fromwind owto window. Line 0 of Table2.7gives

J2

(50)

PE line

0 (A, ') (B, 4) (C,7) (-,<X» (-,00) (-,00) (-, 00) (-,<X»

I (-,0<) (-,0<) (-, 00) (A , ') (B, 4) (-,00) (-, 0<) (C , 7) 2 (A , 3) (-,0<) (-,00) (-,<X» (-,0<) (D,<) (C,7) (-,<X»

3 (-,00) (-, 00) (A,') (-,<X» (-,00) (B,<) (C,7) (-,<X»

Table 2.7:Example todistributein an SIMD hypercube aninitialconfigura tionfo rda tadist ributioninaneightI)TOre SSOTwindow of anSIMD hypercube. Each. recordisrepresen ted as a tuplewiththesecond entry being thedestination.Line1givestheresultofthedistribution .

Since dat adistribut ion is the inverseofdata concentration,itcan be carriedoutby ru nning theconcent ra tionprocedure inreverse [151. Tile resultis theprograminTable2.8. Lines2, 3,and 1 ofTable2.7give the configurat ions(orthe examplefollowingthe iterationsb ;;;;2,1,and 0, res pectivel y, shown in Table 2.8.

2.2.5 Generalization

The initial configuration forageneralizationissimilartothat (or a da ta.dist ri bution . Itbegi nswith records,(J,in processors0,..,rofII.

33

(51)

proceduredjslri~utc(G,k);

{Dist ribute records G.2kisthe window size}

begin

for" ;=k- ldownto 0 do begin

1" (i(~) )- (,'(i l;

G(;)_/,'(;) , ((G(;).IJ"coand(G(;) .D)o,,;.)) odl'( ; ).IJ"cc and(l'U).IJ).,H.)));

end;

end; {ofdi."" 'iblllr}

Table 2.8:Programforproceduretodistribu te records hyper cubewindow ofsize2k. Eachrecord,GU),has ahigh destination (:(i). 11associatedwithit, 0:::;jS;r,Thehigh destinatio nsin eachwindow are such tha tG(O).l1<G(I).1I<.·.0(1·).11. LetG(-I ).1I:::O.The record which isinitiallyin processorjofth e window is to berouted to processorsG(i-1)./ / ,O(i-1).11

+

1,. ",G(i)./fof thewindow,0S;i $ r.

Not e that,.may varyCromwindow to window.Line 0aCTable 2.9give s an initialconfiguration for datageneraliza tion inan eight processorwindow of an SIMD hypercube.Each record isrepresen t ed asa.tuplewiththesecond entrybeing thehighdestinat ion.Line 1gives theresult of the generalizat ion.

Data generalization is done by repeatedl yreducingthe windowsize by half[15].Whenthewindow size ishalved,it shouldbe ensuredthat all record sneeded inthe reducedwindowarepresentinthat window.Beginning with a window.~izeofeight and line0 ofTable2.9, eachprocessor sends its

34

(52)

~E line

(A,') (B,') (C, 7) (-,00) (-,00) (-,00 ) (-,00) (-, oo) G (A,') (A,3) (A,3) (A,3) (B,.) (C,7) (C,7) (C,7) G (-,00) (-,00) (-,00 ) (-,00 ) (A,3 ) (D,.) (C,7) (-,00) F (-,00) (-, 00) (-,00) (-,00) (-,00) (B,') (C,7) (-,00) F (A, ') (B,') (C,7) (-,00) (-,00) (B, ' ) (C,7) (-, 00) G (C, 7) (-, 00) (A,3) (B,') (C,7 ) (-,00) (-, 00.) (Il, ') F (C,7) (-,00) (A,3) (B,') (C,7) (-,00) (-, 00) 1-, 00) F (A,3) (B,' ) (A,3) (B,' ) (C,7) (Il,') (C,7) (-, 00) G (B,') (A,3) (B,.) (A,3) (B, .) (C,7) (-, 00) IC, 7) F

Table 2.9:Exampleto genera li zeinanSIMO hypercube

35

(53)

recordtoittneighborpro cessoralongbit2. Theneighborprocessorreceives therecordillF,Line 21howl theFnluesfollowingthetransler.Next,some

""IandG'sare eliminated.This isdonebycomparingthehighdestinalklll o(

arecordwiththelowestprocessorindexin thesizefourwind owthatcontains therecor d.IfthecomparilOnsucceeds,then,tberecordisnot neededinthe size four window,Applyingthiseliminatio ncriterio ntoline 0result sinthe eliminat io nofno G,However,when thecri terionill applied totheF's afline 2,/0'(4)iselimin atedand theconfiguratio nof line3 will begiven. Atthis pointeach window ofsize four hasall therecordsneededinthat window.

Thereco rdsare,however,in bothFan dG.Toeoasclidetethe required recordsintotheC'Ialone, thefollowing consolidation crite rionisused:

replacea(i)byF(i) in caseF(i).11<G(i),l1 i.t .,of thetworec ord: ina PE,the onewit hsmaller high dettinationsurvives.

Applyingthe consolidat ioncriteriontolines 0 and 3resultsin line 4, Next, record.aretransferredalong bit1.TheFvaluesfollowing this tratllferere givenin line 5.Following the applicat ior.oftheeliminationcri- terion,theFvaJueoflin e6 isobtained.TheCval uel arcunchanged.When thecone clideti oncriterio n isapplied, theG valuesareasinline7.Line8 showsth eFvalue.following atransleralongbit 0,Theeliminationcriterion resultsin line1. Procedur egClIcmli:eshow nin Table 2.10 implementsthe generaliza tion strategyjUlt outlined,

36

(54)

proceduregrll emfi: c(G,k);

{General ize record sG.2~is thewindowsize}

begin

forb:=k-l downt o 0do begin

{Transfer to neighboring window ofsiee2~}

F(i1bl }_ C(i);

{Elimina tioncriterio n}

G(i).1f:=oo,(G(i).1f<i-ib-l..o)i F(i).1f;=oo,(F (i).1I<i-ib_1..u):

{Consolidationcriterion}

C(il'~F(i), (F( i).1f<C(i).II );

end;

end; {ofgcnemli:c}

Table 2.10:Programforproceduretogene ralize records

37

(55)

ForII.natu ralnumbe rI,jn..,~isthenumberobtained{romiby flipping allthebitsinthebin a ryreprese ntat io nofifro mthent h tothemthpla ce.

2.2.6 Merging and Unmerging

Giventwosorte dsequenceseachstoredinahyperc ubeof sizen/2,their merg ingcan be doneinO(log n]time.Mergin g two sortedsequ encescanbe doneby bitonicsort.

Abi/tHlicsequ o u:eisa.nonincrcas ingsequenceof number sfoUowedby ancn decrea ain gsequence, Eit her(orboth ) ofthese ma.ybe empty. The seq uenceha sthefor mXI ;:::X2?:".;::: Xk:::; Xk+l::;;'' ':5Xn,{orsom ek, 1S~,:5u.Thesequences10 , 9,9,4, 5,7,9;2, 3, 4, 5,8;7,6,4,3, 1;and 11, 2, 5, 6, 8 , 9 areex amplesofbit onicsequences.

Abit onlc sortis aprocesswhich sorts abitoni c sequence into either noni ncreasingorno n decreasin gorder. Suppose wearegiventwosorte dse- que neea"1:5112S...:5VIand10,:SIt'.!.:5..::5w"" Firs treverseone ofsequences. Thiscanbedone inO(logn] timebydoingthereversin,9 procedure.

Suppos ewehav e arevi sing seque nce{VI,lJ-l,",VI),we get seq uence (1". l'l_h' "•1'2.1' l }' Byconcatenatin g twosequencesto obtainthehitonic seque nce1'1?:111_1;:::, ..?:1'1?Ull:::; Wl:5 •• •:5tum=Xl?:X2?:?:Xk

38

(56)

procedureSfMDRcvisil1!A.tI,k)j

{These quence is stored in window ofsize2kI{Re visinga.1Ithe elements in window s

or

size 2"j

begin

forb:=0 toL· - 2do A(jli) )...A(j) end;{afSIMDRcvisi ll g}

'Iable2.11:ProgramforSIMD Reversing

~Xk+15...=:;:r~wher e11

=

I

+

III.The resultin g bilon icsequen ceisthen sorted usinga.hitonicsort toobtainthedesiredmerged sequence.So,Ioe examp le,ifitisdesiredtomerg etheac q uc nccs(2, 8,20,24) l\ncl (1,9,to, 11,12 ,13, 30),the bit onic sequ ence(24, 20,8,2, 1,9, 10,II,12,13,30) should be firstcreated,

Bajcher's Monicsort[191isideallysuitablefor implementationon ahypercubecomputer. Batcher's algo r ithmto sort the hitonicsequen ce

Xl,' " ,xnint o ncndeere esingorderis give n in Table2.12.

Example: Considerthebitanic seque nce(24, 20,8,2, L 9,JO,11, 12, 13, 30). Supp osewewishtosort this intonondecreasing order. Theod d sequen ceis (24 ,8,1, 10,12,30)andtheevensequen ceis(20,2, 9,11, 13).Sortingthese,thesequence s(I,8,10, 12, 24,30)and(2, 9, 11, 13, 20) are obtained.Puttingthesortedoddand evenpartstoge ther,thesequence (I, 2,8,9, 10, 11,12,13,24,20,3 0)is given.Afterperfor mingthelltJ2J comparejexchen gee of step3,the sortedsequen ce(I, 2, 8,9, 10, 11,12,13,

39

(57)

Shp I: [Sortoddsubseque nce]IfII>2thenrecursively sortthe odd bitcnic subsequenceXI,X:hX5, •••into nondecreesi ng order

SI'-IIfl: [Sortevensubsequence]Ifn>~then recur sively sortthe evenbito nicsubsequence:f2,.1.'4 ,.rc"...intonondecrees- ingorder

Hhptt: [Compare/ exchange] Comparethepairsofelements.rj

and.£;+1(oriodd andexchangethem incaseXi>Xi+l

Table 2.12: Bitonic sort intonondecreasing orde r

20, 24,30)isobt ained .D

When11isapower of 2, the recursion in Table2.12can beunfoldedto obtain the comp aring/exchangingalgorithmof theprogramin Table2.13.

Ineachiterati on ofthe whileloop,each sequenceelement ispaired with exactlyoneothersequence elementthatisadist ance(ffromit . The pairs neeform edfromleft to tight. To obtai n a nondecreasing sequence each com- ['aring/ exchangingcausesthe smallerelementof the pair tomovetothe left positi on.Ifa nonincreasingsequence is desiredthesmallerelementismoved to theright.Table 2.14showsaneightelement bitonicmer gethat result sin a nondecrcesingsequence and Table 2.15 givesanexa mple thatresult sina nonincreesing sequence. Theexamplesassumetheelements to be sortedarc storedin processorsofahypercube withoneelement per processor.As can be seen, theelements that form eachofthepairs for the compa ring/exchanging opera tionareinprocessors thatarehypercube neighbors. Hence each iter-

40

(58)

procedureIJiloll;CSOr/(II);

{Sort the bitonic sequence;'1, ··..I'Il}

{IIisa power of2 } begin

d=JI/2j while0do begin

compare/ exchan ge elementsdapart d=fl/2;

end;

end; {oflJilullirSorl}

Table2.13:Iterati ve bitonic sort forII, Itpower of2 arion ofthe whileloopof theprogramin Table2.13lakes0(1)timc011a hyper cube .Thetotaltimetosortan"element bito nic sequenceistherefore O(logn}.

Given asorted sequenceofclement s sothatpart of theelements belong to a setA(thu sthe remainin g belong toIf)andeach elementknowsthe corresp onding rank inAorA, permut e the sequencetoreturneachAand A.Thiscan be doneby running the merging algorithmin reverseord"'f Therefore,Unmerging can bedoneinO(logII)time.

2.2.7 Sor t in g

To sortnelementsusingbitonic sort , we begin with sorted sequencesof sizeone.Adjac entpairsof theseform bitonic sequences that arc sorted (in parallel)to obt ain sortedsequencesof size two. Thesortingisdone such

41

(59)

PE

line 0 1 2 3 4 5 6 7 d

o 7 6 4 0 1 23 5 4 '-=LI:=~-_I

)J

1 2 3 0 7 6 4 5 2

~.I '---t:=-.J

1 0 3 2 4 5 7 6 1

I---J 1---1 l---J

3 0 1 2 3 4 5 6 7

Table2.14:Power of 2 bitonic sort[nondecreasingorder)

42

(60)

line 0 1 2 3

PE

5 6 7 d

o

7 6 4

a

I 2 3 5 4

'---<=F .cC[::'-".'.,, '.J .

7 6 3 5 1 2 30 2

~ '-c:::...l

21 6~ U ~_.~ 1

3 7 6 5 4 3 2 1 0

Table 2.15:Power of2bitoniesort(ncninereesingorder]

43

(61)

that thesize two sequences are alternatelynon-ncreesingand nondecreasing sequences[I,e, the lirst , third, fifth,... ,sequences arenonincreasing andthe remainderarenondecreasing).Consequentlyevery pair of adj acentsize two sequencesforms abitonic sequenceofsize fourwhichcanbe sorte d using bito nicsort. The sizefour sequences are alsosortedalternatelyinto nonin- creasing and nondecreasin gorder . Continuinginthis way,we canobta inII.

sortedsequenceofsize11after log"bitonicsort ingsteps.Note thatifthe sortedsequence istobe innondecreesing order,then thelast biteniesort stepshouldsort thefirstandonlyresulting sequence intothisorder.The total timeforthe sortingis 0(log2II).

Exam p les Sort ingfollowing sequence c nmLhepdgjl k beio

into nondecreasing orde rand that a<b< ...<0<p.Thepairs (c n), (m f),(h a),(pd),(8 j),(I k), (be)and(i0)are bitonic sequencesthat are sorted byusingbitonic~ortto obtainthe:sequence:

n cI m h a daj gk Lebie

Notethattheodd pairswer e sorted intononincreasingorderwhilethe even ones weresorted intonondecreasing order.Nextlet usconsider the adjacent sequencesof lengt hfour.These are(ncfm), (ha dp),(jgkI) and (ebi0).

Sinceeach is a bitonicsequence,it maybe sorte d usingbit onie sort . The resultis:

44

(62)

nmfc adhplkj gb ei o

Once again theoddsequences aresorted into nonlncrensingorde r whilethe evenones aresorted int onond ecreasing order. Two bitonic sequencesof length eight,(nmfcad h p) and(lk j g be i0),nrcgiven.Sortin g them willgive thesequence:

pnmhfdc a b eg ij kl o

Sortingit intononde crcasln g order resultsin the sequence:

ab cd ef g hijklmn op

o

2.2.8 Random AccessRead

Ina randomaccessread (RAR),someofthe processor s of thehyper cube wish toreaddata from other processorsof the hypercube .LetA(i )bethe PEfromwhich proceescriwishestogetdata.Thedata tobe obtainedis D( A(i)).In caseFEidoes not wishto read dat afromanyothe r PE,then A(i)""00. Line 0 of Table2.16givestheAvaluesfor anexample RAR inaneight processorhypercube .Not ethatinanRAR,several processors may read (rom thesame PE.An RAR can bedonein0(log11l)tlme inlIn1/

processorhyper cubeusingthealgorithmof the program shown inTa ble 2.17 [15J.

(63)

-, - ,

N~

46

(64)

Sl epI: Eachprocessorf/crea tes a triple(..t{II),II,jlll.'{)where flagisa Boolean entity that.isinitiallyhue.

Slep2: {Sortl Sor t the triplesintonondecrensin g order of tileread addr esslI(fl) .Tripleswiththesameread address arc in nonde creasingorder ofthePE indexfl. Furthermore, during thesort, thelingent ry ofatripleis settofalse incasethere isa tripleto itsrightwiththesa me rend address.

Step~ : [Rank]Processorwith triples whosefirst compone ntof00 and whose thirdcompo nent(i.r.oJIIlY)istrue areranke d.

Stc114: Eaehprocessorb thathasa triple (A(I/),Il,true) with A{a)f.00crea tes atripleofthe form(fi(l,) ,A(II), h) whereR{b)is therank computed in thepreceding step.

Step5: {Conce nt ra te]The triplesjustcreated are concentra ted.

Step6: Each processor r:that has a concent rate d triple (r(b),A(a),b)create s a tupleoft heform ((:,lI{tt)}.Note thlLtsincec

=

U(IJ)thistupleisjustthefirsttwocom- ponentsof thetriple.

Slcp7: [Dist ribute] The tuples aredistr ibutedusingtheseeond com po nent asthe destinationaddress.

Slep8: EachprocessorA((I)that receivesa tuple(I:,A(II))ere- atesthetuple(c,D( A(n))) .

SICI!9: [Concentrat e]Thetuples create d in theprecedi ng step areconcent rat edusing the first componentastilerank.

47

(65)

Slcp/0: Each processorc that received atuple [c,D(A(a»)in the last step aha has a tripleofthe form(R(b),A(a),b)that it received inStep5(noticethatc=R(b)).Usingthis tripleandthetuple received.in Step 9it createsthetriple (1.,D(A(n))).

StepfI: [Generalize]The tuples(b,D( A(a)))are generalizedusing thefirst componentas thehigh destination.

Stcpl!l: Each processorthatrecei veda tuple(6,D(A(a») in Step11alsohasatriple(A(a),«,jlag)thatit at ob- tained as aresultofthesort of Step 2. Using informa- tion from the tuple andthetripleat createsa newtuple (a,D(A(a))). Processe-sthatitdid notreceive atuple U5e thetriple theyreceivedin Step 2 and form the tuple (n,- ).

SfepIfl: {Sort]The newly createdtuples ofStep12are sort edby theirfirst component.

Table2.17:Algorithm{or a randomaccess read

48

(66)

Consider theexample of Table2.16.InStep1,eachprocessor creates a triplewith thefirstcomponent being theindex oftheprocessorIrom which itwants toreaddata; thesecondcomponentisits own index;end thethird componentisa flagthat is initiallytrue(t )for alltriples.Then, inStep 2 the triplesaresorted on the first component.Triplesthat havethesame first componentare inincreasingorderoCtheir secondcomponent.Withineach sequence oftriples that havethe same firstcomponent only thelastonehns a true flag.Theflag(ortheremaining triplesisfalse.Thefirst components ofthe tripleswith atrue flag giveallthe distinct processorsfrom whichdat a is toberead.

Processors 0, 2, 3,and 5 are ran ked in Step 3. Sincethe highestrank is three,datais toread from only four di.dinctprocessors.In Ste p 4,the rankedprocessors createtriples oftheform(ll(b),;1(11),11).Thetriplesnrc thenconcentrate d. Processors 0 through3 receive theconcentrate dtriples and form tuplesofthe form (c,A(a)). Becauseofthesort of Step 2,the secondcomponentsoCthese tuplesarein ascendingorder.Hence, theycan berouted totheprocessors givenby the secondcomponentusinga dat a dist ribut ion asinStep 7.Thedestination processors of these tuplesarc the distinctprocessors whoseda.taistoberead.Thesedestina\.ionprocessors create,inStep8, tuplesofthe form(e,IJ(A(n)}) wherec istheindexof the processor that originatedthetupleit received.Thesetuplesarcconcentrated inStep9 usingthe firstcomponentes the rank,

49

(67)

In Step10,thereceiving prccesscee{i.e.,0 through 3) usethe triples receivedin Step 5 &J1dthetuplesreceived inStep9to createtuplesof the form(ii,IJ{A{II))).The tint com ponent isthe induof the processortha t originated thetriplereceived in Step 5.Sincethe trip lesreceivedinStep5are theresult of aconcent ra tion,thefint comp onentofthe newly forme d tuples arcin ascendingorder. Thetuples are thereforeread yforgenera lizati on using thefirst component ...thehigh index.ThisitdoneinStep11. After thisgeneralizationwehavethe right numberof copies of eachdat a. For examp le, twoprocessors(4and 5) wantedtoread fromprocessor 7 and wenow havetwocopiesofD(7) .Comparing thetriplesof Step2 and the tuples of Step11,weseethatthesecon dcomponentof thetriples tellsus where the data in the tuple!i.toberoutedto.InStep12 we createtuples thal a:lntai n thedestinat ion processorandthe data .Sincethedest ina tion addresses arenot in ucending orderthe tuplescannotbe routedtotheir destinat ion processors u.inga distribute.Ratb ee,Ihey mullbesortedby delltinat ion.

2.2.9 Ran d o m AccessWrite

A random&CCC'ISwrite(RAW) islike arandom accessreadexcept thai processors wishtowriteto otherprocessors rat her thantoread fromthem. A rand om accesswriteueesmanyofthe basic stepsusedby arand om access read.Itis,however,quite&bitsimpler. Line0 ofTable2.18 givesthe

50

(68)

indexA(i)ofthe processorto whichprocessor;wants towriteitsdataV(i) . A(i)=00wh enprcceescr;i.5not towriteto anotherprocessor.Observe that itisj-cesible £OrI5everaJprocessorstohave thelame writeaddre"A.

Whenthishap pen., it ils&idthattheRAWhaa eellisiens.Itispcssjbleto formulateseveralstrategieatohandle collision•.Threeoftheseare:

1.Arbitrar yRAWofaUthe processorsthatattemptto write tothe sameprocessoreXll.ctlyonesucceeds.Anyof thesewritingprocessors maysucceed,

2.High est/lowestRA\Vofalltheprocessorsthatattempttowriteto thesameprocessorthe one with thehighest(lowest )indexsucceeds.

3.Com b in ingRA'Valltheprocessors succeedin gettingtheirdatato thetargetprocessors.

Considerthe e:xam pleof line0of Table2.18.InanarbitraryRAWany one of0(0),0(2)and0(7)will gettoproeessce3.Oneof/J(I)andV(5) willgettoprocessorO.And0(3)and/) (4) willgetto processors " and6, respectively.Inahighe5tRAWD(7),/)(5),D(3),and0(4),respectively, gettoprocessors3,0,4,and6,respectively.InalowestRAW0(0),/J(J), D(3)andD(4)get toproeesecrs3,O,4,and6, respectively.Ina combining RAW0(0), D(2),andD(7)allgetto processor 3,Both0(1)andD(5)get toprocessorO.And0(3)andD(4)get toprocessors4 and 6,respectively.

51

(69)

~-=: ~

~

-r

~ ~~ _~ ~ i.e c.

~ ':::"~

~ ~ ~~ 5'~

:i..c.

e.-,,;

-::-~

M

M~~

!i.!i

- -

....:...:..o~

~ o~~ o~~

-:;-~

o

M~~

!i~

"

.s

'0

(70)

The steps involvedin anarhit rary'RAWarepveninTable2.19(15).

Letus go through the example inTable 2.18.Eachprocessor first creatt'S tripleswhose first componentisthe index,A(a),ofthe processorto whichit isto write.Itssecondcompon ent is thedata,l1(fl),to bewritt enand the third component istrue.Thetriplesarethen sortedonthefirst com ponent.

DuringthisSOT!thef1.ag entr yof a triple is changed tofalse incase there is atriple with the same writeaddresstoitsright .Onlythetriples with

atruef!>,~areinvolved intheremaind eroftheAlgorithm.Noticethntfor

eachdistinctwrite addren therewill beexactlyonetriplewith n truenag.

The processorsthat have a triple withatrue flagareranked (Ste p 3)anti theseprocessors create new tripleswhose firstandsecond componentsarc thelame as inthe old tripleshut whose thirdcomponent isthe rank.The triplesarcthen concentratedusingthisrank informa tion.Since the triploo arein ascending order ofthe write addreues(firstcomponent)they lila,.be routed to theseprocessorsusing a data distributeoperation.Notethat for Step 6 thethird component(i.e.,rank)ofeachtriplemay bedroppedbefore thedistribu tebegins.

Thecomplexity of arandomaccess writ eis determined bythesortJ\cp which takes 0(log2u}timewhere11isthenumber of proeesecte.

Ahighest (lowest )RAW canbe donebymod ifying theprogramill Table2.19 slightl y. Step1creates 4-tuplesinstead oftriples.Thefourth

53

(71)

Slip I: Each processor1/creates atriple(A(a),D(a),flag) where flfl.fJisaBooleanentitythatisinit ially true. ,"'!fIJt.!: (Sort]Sortthetriplesintonondeereasing order ofthe

write addressl1(a ).Tiesarebrokenarbit rarilyanddur- ingthesort theflagentryof a tripleissettofalseincase thereis atripletoitsrightwiththe same writeaddress.

Slfjl..t: [Rank]Processorswith tri ples whosefirst comp onent is

not0;:>and whosethird component(Lc.,J/'Ig)istrueare

-anked .

."'lIpf: Eachprocessor bthl\t hasa triple (A((I),D(a),true)with 11((1) '" 00 createsa.triple of theform(A((I),D(a),R(b)) whereU(II)is therankcomputed in thepreceding step.

S/flJ,r,; [Concent rat e] the triplesjustcreat edare concentra ted.

,"'!r'JI'i: [Distri bute] theconcentratedtriplesare distrib uted using thefirst componenta.sthed.,tination address.

Table2,19:AlgorithmCor an arbitrary rando m accesswrite

54

(72)

component isthe index oftheoriginating processor .In the sortstep (Step2) ties arebrokenby thefourthcompone ntin suchaway thattherightmo~l 4-tupleinanyseque nce withthe samewriteaddr ess isthe -t-tuplc tobe suc- ceeded(i.e.,highestof lowes t fourthcomponent in thesequence) .Following this the fourthcompo nentmay be droppe d fromeach-t-tupl c. The remAining stepsareunchanged .

The 6leps{or a combin ing RAW arealso similar tothoseintileprogram shownin Table2.19. When theranking of Ste p3isdone, nversion of procedur e rank(Ta ble 2.14) isused,which docsnot containthelastline (R(i):=oo,(n o t$r.lcclc.l(i ))).As aresult processor0{Table2.[8) has a rank of0 and processors 2 and3have a rankof1.During theconcent ration step (St ep 5)morethan onetriplewilltrytogetto thesa meprocessor.

Procedur econce nt rat e(Tabl e2.14)is modifiedto combine together triples that have thesame rank.Thesemodi ficationsdonot changetheneymptotic complexityof the RAWunless thecombiningope ra tionincreasestilelriple sise[asin a concatenate). Incuerldata valuesaretorench theSIUliC destina ti on ,thecomplexit yis O(logl"+rilogII).

2.2.10 Precede

Given two sortedlis tsA=(111)'11... ,II..)andIJ ""(I'I,lI~,...,1J,,),we definethe predecessor of(Iias the leastelemen tIljinIJpreviou s toIt;in tile sorted H3t ofAandH,i.e.hj:5(Ii::;/Ij+l'Ifthere is nosuch Ilj,weassumeits

ss

(73)

predecessoris theleastelementb",inIi.The Precedeope rationcomputes, (oreachelement(/iinA,its predecessorinH.ThePrecedeoperation can be performedin thefollowing way:Foreach element in listB,we assign a flag.

Merging two sort ed listsAand11,we can get asortedlistC.Performing rankoperatlcn, foreachelement(Ii,we can get its predecessorinB.The Precedeoperationcan bedonein O(logII) time.

2.2.11 Summary

In thissubse ction, weprovidea summary.In thefollowing,1"01and IV arepowersof2.Theyrepresent the size(i.e., the numbe rof processors)in a eubhypercube.Unless otherwise stated , the size of thefull hyp er cubeis denoted byP.

1.Maximum

Ta sk .Thiswork s oneach dimensionk,k=log IV, subhypcrcu beof an SIMD hypercub e.Thehypercube PEI=ill'

+

'1,0:s:q<Wis the r/th PE inthe j'thdimensionksubhype rcub e.ThisPE computes,in its5'register,the maximum of theAregister values oftheO/ththough r/thPEs in its sub hypercube,0:5q<1-1',0 :::::i<w,where"' isthe numberofsubhype rcubesof dimensionk.

Com plexity: O(q.

2. Ranking

56

(74)

Task:Rank the selected processorsineach size2~windowoitheSIMI) hypercube.

Complexity : O(! ·).

3. Concentration

Task:LetG(i).H be therankofeachselectedprocessoriinthewindow ofsize2kthat it is cont ainedin.ForeachselectedPE,i,therecord G(i)is sent tothe G(i) . fl'th FEinthesize2kwindowthatcontains PEi.

Complexity: O(!')' 4.Distribution

Task: Thisistheinverseof aconcentration.

Complexity: OU')' 5.Gener aliza t ion

Task.:EachrecordU(i)hasahighdest ination(,'(i ).II. TIlehigl.

destinationsineachsize2kwindowarein ascendingorder.Assumethat G( -1).//

=

O.The recordinitiallyin processoriis route dtopro cessors G(i-1).11throughG(i ).11of thewindowprovidedthatf:{i)./1fQ("

IfG(i)./1

=

00,then therecord isignored.The procedureaswritt en assumesaPEorderingtha t corresp ondsto thatgen er allyusedfor SI MDhypercubes.

"

Références

Documents relatifs

Importantly, for the same periodicity of a hncp nanorod array, the distances between nanorods were further changed by varying the background gas pressure during the PLD (here,

contaminating the results of the study. Instead, the current study attempted to determine if the negative effects of watching a sexualized music video can be lessened by viewing

After recalling all necessary notions of formal concept analysis in Section 2, we shall describe in Section 3 our approach of computing the canonical base in parallel1. Benchmarks

delineate strategiea and tecbDiques iD career awareness and career exploration for learning disabled junior high students, and (c) to outline and.. describe a

~~fany women left Newfoundland to work as domestic servants in New York, Boston, Sydney, Nova Scotia, and Montreal. See, Shelley Smith. Most of the 30 women interviewed for

Many studies (Biggs. Harvey &amp; Wheeler. 1985~ Dienes, 1963; Kraus, 1982) have shown that problem solving strategies have been acquired and enhanced through the use

Yrt potsntial benrfits are now only $2.9 million. Lleat subsidies cire threr to four tirnrs Iiirger than the value of the procrssrd m a t in the market: wct are

ln 1994, when dredging was the predominant activity, humpback whales were less likely to be resighted near the industrial activity and exhibited movement away from the