• Aucun résultat trouvé

Laboratoire de l Informatique du Parallélisme

N/A
N/A
Protected

Academic year: 2022

Partager "Laboratoire de l Informatique du Parallélisme"

Copied!
12
0
0

Texte intégral

(1)

Laboratoire de l’Informatique du Par- allélisme

École Normale Supérieure de Lyon

Unité Mixte de Recherche CNRS-INRIA-ENS LYON

n o 8512 SPI

Implementing Java onsisteny

using a generi, multithreaded

DSM runtime system

Gabriel Antoniu, Lu Bougé

Philip Hather, Mark MaBeth

Keith MGuigan,RaymondNamyst

February2000

ResearhReportN o

2000-07

École Normale Supérieure de Lyon

46 Allée d’Italie, 69364 Lyon Cedex 07, France

Téléphone : +33(0)4.72.72.80.37

Télécopieur : +33(0)4.72.72.80.80

Adresse électronique :

lipens-lyon.fr

(2)

using a generi, multithreaded

DSM runtime system

GabrielAntoniu, Lu Bougé

Philip Hather, Mark MaBeth

Keith MGuigan,Raymond Namyst

February 2000

Abstrat

Thispaperdesribes Hyperion,an environment for exeuting Java pro-

grams on lusters of omputers. To provide high performane, the en-

vironment ompiles Javabyteodeto native odeand supportstheon-

urrent exeution of Java threads on multiple nodes of a luster. The

implementation uses the PM2 distributed, multithreaded runtime sys-

tem. PM2 provides lightweight threads and eient inter-node om-

muniation. It alsoinludes a generi, distributedshared memory layer

(DSM-PM2)whihallowstheeientandexibleimplementationofthe

Java memory onsisteny model. This paper inludes preliminary per-

formaneguresforourimplementationofHyperion/PM2onlustersof

Linuxmahines onneted bySCIandMyrinet.

Keywords: parallelJava ompiling,DSM, Java onsisteny,

multithreading, PM2.

Résumé

NousprésentonsHyperion,unenvironnement permettant l'exéution de

programmes Java sur une grappes de PCs. pour permettre une exéu-

tioneae, etenvironnement ompile lebyteode Java en ode natif

et supporte l'exéution onurrente des threads Java sur les diérents

noeudsdelagrappe.Hyperionutilisel'environnementmultithreaddistri-

buéPM2.PM2fournitdesméanismes eaespermettantl'utilisation

desthreadsetdisposed'unelibrariedeommuniationportableetpefor-

mante.PM2fournitaussiuneouhedemémoirevirtuellementpartagée

(DSM-PM2) qui peutêtre ongurée pour permettre l'implantation du

modèle de ohérene Java. Ce papier présente quelques mesures préli-

minairesde performane denotreimplantation Hyperion/PM2dursdes

grappesde PCssous Linuxinteronnetés par SCIet Myrinet.

Mots-lés: ompilation Java parallèle,DSM, ohérene Java,

multithreading, PM2.

(3)

using a generi, multithreaded

DSM runtime system

Gabriel Antoniu

Lu Bougé

Philip Hather y

Mark MaBeth z

KeithMGuigan y

Raymond Namyst

28th February 2000

Contents

1 Introdution 2

2 Exeuting Java programson distributedlusters 2

2.1 Conurrentprogrammingin Java . . . 2

2.2 TheHyperionSystem . . . 3

3 ImplementingJava onsisteny 5

3.1 DSM-PM2: ageneri,multi-protoolDSM layer . . . 5

3.2 Using DSM-PM2tobuild aJavaonsistenyprotool . . . 6

4 Preliminaryperformane evaluation 8

5 Conlusion 9

LIP, ENS Lyon, 46 Allée d'Italie, F-69364 Lyon Cedex 07, Frane. Contat:

fGabriel.Antoniu,Christian.Perezgens-lyon.fr . This work has been supportedby the IN-

RIAResCapAResearhCoordinatedAtion,theNSF/INRIAC*ITCooperativeResearh

Grant,theCNRSARPResearhProgramandtheReMaPProjet,INRIARhne-Alpes.

y

Dept.ComputerSiene,Univ.NewHampshire,Durham,NH03824,USA

Contat: Philip.Hatherunh.edu

z

Currentaliation: Sanders,A Lokheed MartinCompany,PTP02-D001, P.O. Box

868,Nashua,NH,USA.

(4)

The Java programming language is an attrative vehile for onstruting

parallel programs to exeute on lusters of omputers. The Java language

design reets two emerging trends in parallel omputing: the widespread

aeptaneof both a thread programming model andthe useof a(possibly

virtually)sharedmemory. Whilemanyresearhershaveendeavoredtobuild

Java-based toolsforparallelprogramming, wethink mostpeoplehavefailed

toappreiatethepossibilitiesinherentinJava'suseofthreadsandarelaxed

memory model.

There are a large number of parallel Java eorts that onnet multi-

pleJava virtualmahines byusing Java'sremote-method-invoation faility

(e.g.,[2,3,13℄)orbygraftinganexistingmessage-passinglibrary(e.g.,[4,5℄)

onto Java. In ontrast, we view a luster asexeuting a single Java virtual

mahine. Theseparatenodesofthelusterarehiddenfromtheprogrammer

and aresimply resoures for exeuting Java threads withtrue onurreny:

theJavathreads aremapped onto thenativethreadsavailableatthenodes.

The private memories of the nodes are also hidden from the programmer:

our implementation supports the illusionof a shared memory within a dis-

tributedluster. Thisillusionisonsistent withrespetto theJavamemory

model, whihis relaxed inthatit does notrequire sequential onsisteny.

ThediultyisthattheJavamemorymodeldoesnotexatlymathone

ofthe lassial memory onsistenymodels thataresupported byexisting

distributedsharedmemory(DSM)systems. Moreover,ourapproahrequires

atight integrationofaJavaspeiDSMsystemwithanativethread man-

agementsystem. Designingsuhanenvironmentfromsrathisahugetask,

and only very few suh projets have sueeded. In ontrast, we have built

oursystemontopofanew,generi,multi-protool, multithreadedDSMrun-

timeplatform, whih providesall the primitives neessary for a distributed

implementation ofthe Javaonsisteny model. Thisallowed us toomplete

thistaskwithina fewweeks. Moreover, thegeneriityallowedus to explore

several alternative implementation strategies with an invaluable exibility.

Thepreliminaryperformane guresreported aredenitely enouraging.

2 Exeuting Java programs on distributed lusters

2.1 Conurrent programming in Java

Threads and synhronization Itisrelativelyeasy towriteparallel pro-

grams in Java. Support for threads is part of the Java API, and provides

similar funtionality to POSIX threads. Threads in Java are represented

as objets. The lass java.lang.Thread ontains all of the methods for ini-

tializing, running, suspending, querying and destroying threads. Critial

setions of ode an be proteted by monitors. Monitors in Java are avail-

(5)

inline in the ode as a statement, or as a modier for a method. As a

statement, synhronized must be provided an objet referene and a blok

of ode to protet. A method modied with synhronized uses the instane

of the objet it is being alled on and protets the method body. Every

objet has exatly one lok assoiated with it and in both ases a lok is

aquiredon the referened objet, theprotetedode isexeuted, and then

the lok is released. Note that the lass java.lang.Thread also ontains the

methods wait(),notify(), and notifyAll(),whih provide funtionality similar

toPOSIXonditionvariables. Frommonitorsandthewait/notifymethods,

othersynhronizationonstruts,suhasbarriersandsemaphores,aneasily

be built.

Memory model Java hasa well-dened memory model [6℄. All threads

sharethesameentralmemory,soallobjets,stativalues,andlassobjets

areaessibleto every thread. Thus,reads andwrites of suhvalues should

be protetedbymonitors whenappropriate topreventrae onditions. The

Java memory modelallows threads to keep loallyahed opies of objets.

Consistenyisprovided byrequiringthata thread'sobjetahebeushed

uponentrytoamonitorandthatloalmodiationsmadetoahedobjets

be transmitted tothe entral memory when athread exitsa monitor.

2.2 The Hyperion System

Hyperionisanenvironment forthehigh-performaneexeutionofJavapro-

grams developed at the University of New Hampshire. Hyperion supports

high performane by utilizing a Java-byteode-to-C translator and by sup-

porting parallel exeution via the distribution of Java threads aross the

multipleproessors of aluster ofworkstations.

The Hyperion Runtime System Weare interestedinomputationally

intensive programs thatan exploit parallel hardware. Therefore we expet

that the added ost of ompiling to native mahine ode will be reovered

manytimes overintheourse of exeuting aprogram. (Thisfousonom-

pilation distinguishes us from projets investigating the implementation of

Java interpreterson top ofa distributedshared memory [1,15℄.)

To produean exeutablefor auserprogram, Java byteode isrstgen-

erated fromthe Java soure using a standard Java ompiler. (We urrently

useSun'sjava.) Thebyteode inthegeneratedlassles isompiled using

Hyperion's java2. The resulting Code is ompiled using a native Com-

pilerfor the luster and thenlinked with theHyperion runtimelibrary and

any neessaryexternallibraries.

Hyperion's Java-byteode-to-C ompiler, java2, uses a very simple ap-

proah to ode generation. Eah virtual mahine instrution is translated

(6)

invalidateCahe Invalidate allentriesintheahe

updateMainMemory Update memory with modiations made to

objets intheahe

get Retrieve a eld from an objet previously

loaded into theahe

put Modify a eldin an objetpreviously loaded

into theahe

Table1: The Hyperion DSMsubsystem API

diretlyinto aseparateC statement or maro invoation, similarto theap-

proahestaken inthe Harissa [10℄ or Toba [14℄ ompilers. Currently, java2

supportsall non-wide formatinstrutions aswell asexeption handling.

TheHyperionrun-timesystemisstruturedasaolletionofsubsystems

in order to support both easy porting to new target arhitetures and ex-

perimentation withdierent implementation tehniques for individualom-

ponents.

The threads subsystem provides support for lightweight threads, on

top ofwhih Java threads an be implemented.

Theommuniation subsystemsupports thetransmission ofmessages

between the nodesof aluster.

Thememory subsystemis responsible for thealloation, management

(inludingsynhronizationmehanisms),andgarbageolletionofJava

objets. Table1providesthekeyprimitivesforimplementingmemory

onsisteny. Onadistributedimplementation theseprimitivesneedto

be supportedbythe underlying DSM layer.

The ompiler generates expliit alls to put and get to aess shared

data. Objets are loaded from the main memory to the loal ahe using

loadIntoCahe. The primitives invalidateCahe and updateMainMemory are

alled on entering/exiting monitors to ensure onsisteny, as desribed in

Setion 3.2.

To implement the above primitives, we utilize DSM-PM2, a generi,

multi-protool DSM layer built on top of the PM2 multithreaded runtime

system. It provides aneasy-to-use API forspeifying onsistenyprotools.

(7)

3.1 DSM-PM2: a generi, multi-protool DSM layer

PM2(Parallel Multithreaded Mahine) [11℄is amultithreaded environment

for distributedarhitetures. Itsprogramming interfae isbasedon Remote

Proedure Calls (RPCs). Using RPCs, the PM2 threads an invoke the

remote exeution of user dened servies. Suh invoations an either be

handled by a pre-existing thread or they an involve the reation of a new

thread. Threads running on the same node an freely share data. Threads

runningon distant nodesan only interat through RPCs.

PM2 provides a thread migration mehanism that allows threads to be

transparently and preemptively moved from one node to another during

theirexeution. Suhfuntionalityistypially usefultoimplement dynami

loadbalaningpoliies. Theinterations between threadmigrationanddata

sharingarehandledthroughadistributed shared memory faility: theDSM-

PM2 layer.

Overview DSM-PM2provides aprogramming interfae to manage stati

and dynami data to be shared by all the threads running within a PM2

session, whatever their loation. To delare stati shared data, one simply

brakets the orresponding Cdelarations byspei DSM-PM2 keywords.

Dynami shareddataarealloatedasneededbyallingaspeialloation

routine insteadofthe ordinarymallo primitive.

Sinethe DSM-PM2 APIisintended both for diretuseand asatarget

for ompilers, nopre-proessing isassumedinthegeneral ase andaesses

toshareddataaredeteted usingpagefaults. Appliations anthusbepro-

grammedasifatruephysialsharedmemorywasavailable. Nevertheless,an

appliation an hoose to bypass thefault detetionmehanismbyontrol-

lingthe aesseswithexpliitallsto get/putprimitives. Insome ases,the

resultingostmaybesmallerthantheoverheadoftheunderlyingfaulthan-

dlingsubsystem. DSM-PM2opeswiththisapproahaswell,asillustrated

bythe implementation of the Javaruntimedisussed inSetion 3.2.

Generiity Sine existing DSM appliations require dierent onsisteny

models, DSM-PM2 has been designed to be generi so as to support mul-

tiple onsisteny models. Sequential onsisteny and Java onsisteny are

urrently available. Moreover, new onsisteny models an be easily imple-

mentedusing theexistinggeneriDSM library routines.

These primitives may also be used to provide alternative protools for

a given onsisteny model. For instane, theusual ation on a aess fault

isto bring the data to theaessing thread. Alternatively, one may hoose

to preemptively migrate the thread to the data: this may be muh more

eient in ertain ases! One may even build hybrid protools, whih are

(8)

PM2 Comm. subsystem Thread subsystem

PM2

DSM comm DSM page manager

DSM protocol lib DSM protocol policy

DSM-PM2

Figure1: Overviewof theDSM-PM2softwarearhiteture

able to dynamially swith from one poliy to another depending of some

external loadinformation.

Struture The overall struture of the DSM-PM2 layer is presented in

Figure 1. The DSM page manager is essentially dediated to the low-level

management ofmemory pages. Itimplementsadistributedtableontaining

page ownership information and maintains theappropriate aess rightson

eah node. The DSM ommuniation module is responsible for providing

elementary ommuniationmehanisms suhasdelivering requestsfor page

opies, sending pages, and invalidating pages. It implements a onvenient

high-levelommuniation APIbasedonRPCs. Ontop ofthesetwo ompo-

nents, the DSM protool library provides elementary routines that areused

as building bloks to implement onsisteny protools. For instane, it in-

ludesroutinesto bringaopyofapage toathread, to migrateathreadto

a page, to invalidate all the opies of a page, et. Finally, theDSM proto-

ol poliy layeris responsible for implementing onsisteny models out ofa

subset of the available library routines and for assoiating eah appliation

datawithits ownonsisteny model.

3.2 Using DSM-PM2 to build a Java onsisteny protool

Protooloverview JavaonsistenyrequiresaMRMW(MultipleReader

Multiple Writer) protool: an objet an be repliated and opies may be

onurrently modied on dierent nodes. To guarantee onsisteny, thea-

esses to shared data have to be proteted by monitors (orresponding to

thekeywordsynhronizedat thelanguagelevel). Onentering amonitor, the

primitive invalidateCahe isalled. Objets areloaded from themain mem-

(9)

themodiationsarriedoutinthe loalahearesenttothemainmemory

via the updateMainMemory primitive. We have implemented these proto-

olprimitives usingthe programminginterfae ofthelower-level DSM-PM2

omponents: DSMpage manager and DSMommuniation module.

Implementation hoies and disussion

Main memory and ahes. To implement the onept of main memory

speiedbytheJavamodel,theruntimesystemassoiatesahomenode

toeahobjet. Itisinhargeofmanaging therefereneopy. Initially,

the objets arestored on their homenodes. Theyan be repliated if

aessedonothernodes. Notethatatmostone opyofanobjetmay

exist on anode and this opyis shared by all thethreads runningon

that node. Thus, we avoid wasting memory by assoiating ahes to

nodesrather than to threads.

Aess detetion. Hyperion uses speiaessprimitivesto shared data

(get and put), whih allows us to use expliit heks to detet if an

objet is present (i.e., has a opy) on the loal node. If the objet is

loallyahed, it is diretlyaessed, else theloadIntoCahe primitive

is invoked. The default mehanism for aess detetion provided by

DSM-PM2isthusbypassedandtheostofpagefaulthandlingissaved.

Analternativewehave plannedto investigatewouldbeto allowdiret

aessandall loadIntoCahe withinthepage fault handler.

Aess rights. Javaobjets annotberead-only,sothatanode aneither

have full read-write aessto an objet, or have no aessat all. This

feature simplies the protool implementation, sine only these two

aseshave to be onsidered.

Sending modiations to the main memory. Sineshareddataaremod-

iedthroughaspeiaessprimitive(put),themodiationsanbe

reordedatthemomentwhentheyarearriedout. Forthispurpose,a

bitmap isreated on anode whena opyofthepage isreeived. The

putprimitivereordsallwritestothepage. Themodiationsaresent

to thehome nodeof thepage bytheupdateMainMemoryprimitive.

Pages and objets Java objetsareimplementedontop ofpages. Conse-

quently,loading an objetinto theloalahe maygenerate prefeth-

ing,sineallobjetsontheorrespondingpageareatuallybroughtto

the urrent node. Onthe other hand,updateMainMemory maygener-

atenon-requiredupdates,sinethemodiationsofallobjetsloated

on theurrent page aresent to the home node. These side eetsdo

notaetthevalidityofour protoolwithrespetto Javaonsisteny.

(10)

Mirobenhmarks We rst evaluate the performane of our protool

primitives on three dierent platforms. The rst olumn orresponds to

measurements arried out on a luster of Pentium II, 450 MHz nodes run-

ningLinux2.2.10interonnetedbyaSCInetworkusingtheSISCIprotool.

TheguresinthenexttwoolumnshavebeenobtainedonalusterofPen-

tiumPro, 200MHznodesrunningLinux2.2.13interonneted bya Myrinet

network using the BIP and TCP protools respetively. The ost of the

loadIntoCahe primitive for a 4 kB objet an be broken down as follows

(timeis given ins):

Operation/Protools SISCI/SCI BIP/Myrinet TCP/Myrinet

Preparing a page request 1 2 2

Transmitting therequest 17 30 190

Proessingtherequest 1 2 2

Sendingbaka 4kB page 85 134 412

Installing thepage 12 24 24

Total 116 192 630

TheproessingoverheadofDSM-PM2withrespettotherawtransmission

time is 1015%. The overhead related to page installation inludes a all

to the mprotet primitive to enable writing and a all to mallo to alloate

thepagebitmap neessaryforreordingloalmodiations. Thislatterost

ouldbefurther improved using austommallo-like primitive.

Asapreliminary test,weran a program thatestimates byalulating

aRiemannsumof50millionvalues. Parallelismanbeutilized byassigning

setionsoftheRiemannsumtodierentthreadsandthendoingonenalsum

redutiontoompletethe alulation. Comparedtoanoptimizedsequential

Cprogram, the Java/Hyperion program ahievesthefollowing speedupson

theMyrinet/BIP lusterdesribed above(time isgiven inseonds):

#Nodes Time Speedup Eieny

Code 9.4 1.0 100%

1 9.6 .98 98%

2 4.9 1.9 95%

4 2.5 3.8 95%

6 1.7 5.5 92%

8 1.3 7.2 90%

Even if this pi program exhibits good performane, it is admittedly rather

simple! For a more omplex evaluation of our approah, we are urrently

workingon aminimal-ost graph-oloring appliation.

(11)

WeproposeutilizingalustertoexeuteasingleJavaVirtualmahine. This

allows us to run threads ompletely transparently in a distributed environ-

ment. Java threads are mapped to native threads available on the nodes

and run with true onurreny. Our implementation supports a globally

sharedaddress spaevia the DSM-PM2 runtime systemthatwe ongured

to guaranteeJava onsisteny.

We plan to make further evaluation tests using more omplex applia-

tions. Thanks to the generiity of DSM-PM2, we will be able to study

alternative protools for Java onsisteny. We also intend to perform om-

parisons between dierent aess detetion tehniques (segmentation faults

vs. expliitloalityheks).

Referenes

[1℄ Y. Aridor, M. Fator, andA. Teperman. JVM:A single system imageof a JVM

onaluster. InProeedings ofthe InternationalConferene onParallelProessing,

Fukushima,Japan,September1999.

[2℄ F. Breg, S.Diwan, J. Villais, etal. JavaRMI performane and objetmodel in-

teroperability: ExperimentswithJava/HPC++. InPro.ACM 1998 Workshop on

JavaforHigh-Performane NetworkComputing,pages91100,February1998.

[3℄ D. Caromel,W.Klauser,and J.Vayssiere. Towardsseamlessomputingandmeta-

omputinginJava. Conurreny: Pratie andExperiene,10:11251242, 1998.

[4℄ A.Ferrari.JPVM:NetworkparallelomputinginJava.InPro.ACM1998Workshop

onJavaforHigh-PerformaneNetwork Computing,pages245249,1998.

[5℄ V.Getov,S.Flynn-Hummell,andS.Minthev. High-performaneparallelprogram-

minginJava:Exploitingnativelibraries. InPro.ACM1998WorkshoponJavafor

High-Performane NetworkComputing,pages4554,February1998.

[6℄ J. Gosling, W. Joy, and G. Steele Jr. The Java Language Speiation. Addison-

Wesley,Reading,MA,1996.

[7℄ P.LaunayandJ.-L.Pazat. AframeworkforparallelprogramminginJava. InHigh-

PerformaneComputingandNetworking(HPCN'98),volume1401ofLet.Notesin

Comp.Siene,pages628637.Springer-Verlag,1998.

[8℄ K.Li and P. Hudak. Memory oherene inshared virtual memorysystems. ACM

Trans.ComputerSystems, 7(4):321359,November1989.

[9℄ F. Mueller. Distributed shared-memory threads: DSM-Threads. In Pro. of the

WorkshoponRun-TimeSystemsforParallel Programming(RTSPP '97),pages31

40, Geneva,Switzerland, April1997. HeldinonjontionwithIPPS'97.

[10℄ G.Muller,B.Moura,F.Bellard,andC.Consel.Harissa: AexibleandeientJava

environment mixing byteodeand ompiledode. InThird Conferene on Objet-

OrientedTehnologiesandSystems(COOTS'97),pages120,Portland,June1997.

[11℄ Raymond Namyst and Jean-François Méhaut. PM2: Parallel multithreaded ma-

hine.aomputingenvironmentfordistributedarhitetures.InParallelComputing

(ParCo'95),pages279285.ElsevierSienePublishers,September1995.

[12℄ B. Nitzberg and V.Lo. Distributed shared memory: A surveyof issues and algo-

rithms. IEEEomputer,24(8):5260,September1991.

(12)

Conurreny: PratieandExperiene,9(11):11251242, November1997.

[14℄ T.Proebsting,G.Townsend,P.Bridges,etal. Toba: Javaforappliationsaway

aheadoftime(WAT)ompiler.InThirdConfereneonObjet-OrientedTehnologies

andSystems(COOTS'97),Portland,June1997.

[15℄ W. Yuand A.Cox. Java/DSM: A platformfor heterogeneous omputing. InPro-

eedings ofthe WorkshoponJavafor High-Performane Sienti andEngineering

Computing,LasVegas,Nevada,June1997.

Références

Documents relatifs

J'ai vécu l'expérience dee run* de té B et 3( H â LVRF avec de* chercheur* qui n'ont par craint d'uiiliter mon matériel et d'accepter ma cet laboration. Tardieu et J.L. Ranck,

LOADING PEST FROM TAPE 2 LOADING PEST FROM DRUM 4 Legitimate Control Characters 4 Illegitimate Control Characters 4 FILES FROM TAPE 5 THE EDITOR UNDER PEST CONTROL 6 EDITOR

Richard Stevens, this edition has been fully updated by two leading network programming experts to address today's most crucial standards, implementations, and techniques.. New

Zaprionus indianus is assigned to the subgenus Zaprionus, species group armatus and species subgroup vittiger (Chassagnard, 1988; Chassagnard and Tsacas, 1993).. We provide here

If the breast is pendulous elevate the breast (infra-mammary area/undersurface of the breast). Notice any changes in the breast skin such as dimpling, puckering, or asymmetry..

using other records in the lake sediment, including pollen (vegetation change), diatoms (lake water pH and chemistry), SOOt (coal combustion), charcoal (wood combustion),

On the contrary, for a long message, the application must be ready to receive such a message when it is sent, and messages for other queues cannot be received before the user

increasingly thought of as bitwise contiguous blocks of the entire address space, rather than a class A,B,C network. For example, the address block formerly known as a Class